global functions (convert key to dummy)
source("thesis_functions.R")
data :D
data <- fread("kpopdata.csv")
data <- mutate(data, ArtistType = as.factor(ArtistType),
ArtistGender = as.factor(ArtistGender),
ArtistGen = factor(ArtistGen),
release_date = as.POSIXct(release_date, format = "%m/%d/%y"),
key = factor(key, levels = 0:11),
mode = as.factor(mode),
time_signature = factor(time_signature, levels = c(1,3,4,5)),
popular = factor(ifelse(popularity >=50, 1, 0)))
General Assumptions: continuous response: popularity score ranging from 0 - 100. mix of categorical and continuous response. the distribution of the variables are not normal, we will check for normality of error terms where appropriate.
Goal: create model for predicting popularity scores
table(data$popular)/nrow(data)
##
## 0 1
## 0.8812021 0.1187979
if classifying a song as popular when it's score is greater than 50, only ~12% of the data is considered a popular song.
ggplot(data, aes(x=popularity)) +
geom_histogram(aes(y = stat(density)), bins = 30, fill = "white", col = "black") +
stat_function(
fun = dnorm,
args = list(mean = mean(data$popularity), sd = sd(data$popularity)),
lwd = 1,
col = 'red'
) +
xlab("Popularity") + ylab("Density") +
ggtitle(paste("Popularity", "Distribution")) +
theme_bw()
summary(data$popularity)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 11.00 23.00 25.88 39.00 94.00
#produce normal qqplot to assess normality
qqnorm(data$popularity, pch = 1, frame = FALSE)
qqline(data$popularity, col = "red", lwd = 2)
#Skewness score
skewness(data$popularity)
## [1] 0.5317036
kurtosis(data$popularity)
## [1] 2.530694
moderately skewed right.
Try square root transformation makes it more normal. This will help to meet the multiple linear regression assumptions.
ggplot(data, aes(x=popularity^0.5)) +
geom_histogram(aes(y = stat(density)), bins = 30, fill = "white", col = "black") +
stat_function(
fun = dnorm,
args = list(mean = mean(data$popularity^0.5), sd = sd(data$popularity^0.5)),
lwd = 1,
col = 'red'
) +
xlab("Popularity") + ylab("Density") +
ggtitle(paste("Popularity", "Distribution")) +
theme_bw()
summary(data$popularity^0.5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.317 4.796 4.679 6.245 9.695
#produce normal qqplot to assess normality
qqnorm(data$popularity^0.5, pch = 1, frame = FALSE)
qqline(data$popularity^0.5, col = "red", lwd = 2)
#Skewness score
skewness(data$popularity^0.5)
## [1] -0.3381829
kurtosis(data$popularity^0.5)
## [1] 2.532572
select just audio features
kpop <- select(data, popularity, duration, acousticness, danceability, energy, instrumentalness, key, loudness, mode, speechiness, tempo, valence)
kpop0 <- kpop %>% filter(mode == 0)%>% select(-mode)
kpop1 <- kpop %>% filter(mode == 1) %>% select(-mode)
### Kpop mode 0 train and test
smpl.size0 <- floor(0.75*nrow(kpop0))
set.seed(123)
smpl0 <- sample(nrow(kpop0), smpl.size0, replace = FALSE)
train0 <- kpop0[smpl0,]
test0 <- kpop0[-smpl0,]
### Kpop mode 1 train and test
smpl.size1 <- floor(0.75*nrow(kpop1))
set.seed(123)
smpl1 <- sample(nrow(kpop1), smpl.size1, replace = FALSE)
train1 <- kpop1[smpl1,]
test1 <- kpop1[-smpl1,]
fit a multiple linear regression model
ml0 <- lm(popularity ~. , data = train0)
summary(ml0)
##
## Call:
## lm(formula = popularity ~ ., data = train0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.283 -13.161 -1.606 11.489 64.381
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.430219 4.992479 19.716 < 2e-16 ***
## duration -0.090184 0.009317 -9.679 < 2e-16 ***
## acousticness -3.819305 2.053031 -1.860 0.062924 .
## danceability -2.192785 3.285433 -0.667 0.504544
## energy -33.971079 3.335048 -10.186 < 2e-16 ***
## instrumentalness -15.674945 4.376253 -3.582 0.000346 ***
## key1 -1.205989 1.499467 -0.804 0.421291
## key2 -2.729912 1.855982 -1.471 0.141416
## key3 3.040837 2.075808 1.465 0.143040
## key4 -4.346089 1.469828 -2.957 0.003129 **
## key5 -0.564903 1.426951 -0.396 0.692217
## key6 1.346808 1.489844 0.904 0.366062
## key7 0.041229 1.632769 0.025 0.979856
## key8 -1.573809 1.763610 -0.892 0.372251
## key9 -0.945472 1.512239 -0.625 0.531872
## key10 -1.757197 1.556351 -1.129 0.258955
## key11 -2.418422 1.389503 -1.740 0.081861 .
## loudness 3.556233 0.202714 17.543 < 2e-16 ***
## speechiness 23.779411 4.747553 5.009 5.75e-07 ***
## tempo -0.004290 0.012545 -0.342 0.732398
## valence -11.884888 1.788630 -6.645 3.51e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.34 on 3483 degrees of freedom
## Multiple R-squared: 0.143, Adjusted R-squared: 0.138
## F-statistic: 29.05 on 20 and 3483 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(ml0)
Interpretting the diagnostic plots:
Residuals vs. Fitted: The line through this plot is somewhat horizontal but has a polynomial curve like pattern throught the 0 horizontal line. This indicates that there could be a linear relationship.
Normal Q-Q: The residuals roughly follow the diagonal Normal QQ line, indicating that the residuals are normally distributed.
Scale Location: The red line does not pass through the scale-location graph horizontally, rather in a slight positive direction. Therefore, our residuals indicate a heteroscedasticity problem. In otherwords, we do not satisfy the MLR requirement of equal variance across residuals.
Residuals vs. Leverage: There are residuals that reach above standard residual 3, therefore, there are many outliers.
Try again with tranformation to make popularity normal:(to the squareroot)
ml0.sqrt <- lm(popularity ~. , data = train0 %>% mutate(popularity = popularity^0.5))
summary(ml0.sqrt)
##
## Call:
## lm(formula = popularity ~ ., data = train0 %>% mutate(popularity = popularity^0.5))
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9651 -1.2163 0.1704 1.3715 5.2110
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.8026945 0.5397564 23.719 < 2e-16 ***
## duration -0.0097617 0.0010073 -9.691 < 2e-16 ***
## acousticness -0.1418036 0.2219612 -0.639 0.522952
## danceability -0.1381154 0.3552010 -0.389 0.697420
## energy -3.9085950 0.3605651 -10.840 < 2e-16 ***
## instrumentalness -2.2443174 0.4731338 -4.744 2.18e-06 ***
## key1 -0.1306867 0.1621132 -0.806 0.420214
## key2 -0.3754834 0.2006575 -1.871 0.061392 .
## key3 0.1823478 0.2244238 0.813 0.416551
## key4 -0.5438271 0.1589089 -3.422 0.000628 ***
## key5 -0.1190871 0.1542733 -0.772 0.440212
## key6 0.1383520 0.1610728 0.859 0.390433
## key7 -0.0235720 0.1765251 -0.134 0.893779
## key8 -0.1566102 0.1906708 -0.821 0.411495
## key9 -0.1520276 0.1634941 -0.930 0.352505
## key10 -0.2351475 0.1682632 -1.397 0.162353
## key11 -0.2909129 0.1502246 -1.937 0.052885 .
## loudness 0.4225379 0.0219162 19.280 < 2e-16 ***
## speechiness 2.1574574 0.5132766 4.203 2.70e-05 ***
## tempo -0.0007701 0.0013563 -0.568 0.570207
## valence -1.1893401 0.1933758 -6.150 8.60e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.874 on 3483 degrees of freedom
## Multiple R-squared: 0.1594, Adjusted R-squared: 0.1546
## F-statistic: 33.03 on 20 and 3483 DF, p-value: < 2.2e-16
assumption checking and diagnostics
par(mfrow = c(2,2))
plot(ml0.sqrt)
Interpretting the diagnostic plots:
Residuals vs. Fitted: The line through this plot is somewhat horizontal but still exhibits some curvature. Overall though, we could conclude that there is roughly a linear relationship between the predictors and outcome variables.
Normal Q-Q: The residuals roughly follow the diagonal Normal QQ line, indicating that the residuals are normally distributed. In comparison to the original scale of the data, there is slightly more deviance on the upper tail with the points falling below the normal QQ line.
Scale Location: The red line is roughly more horizontal. Therefore, our residuals exhibit heteroscedasticity. In other words, the model now satisfies the MLR requirement of equal variance across residuals.
Residuals vs. Leverage: Most of the residuals fall with in the standardized residual valuse of -3 and 3, therefore, we have few outliers in the dataset.
summary(ml0.sqrt)
##
## Call:
## lm(formula = popularity ~ ., data = train0 %>% mutate(popularity = popularity^0.5))
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9651 -1.2163 0.1704 1.3715 5.2110
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.8026945 0.5397564 23.719 < 2e-16 ***
## duration -0.0097617 0.0010073 -9.691 < 2e-16 ***
## acousticness -0.1418036 0.2219612 -0.639 0.522952
## danceability -0.1381154 0.3552010 -0.389 0.697420
## energy -3.9085950 0.3605651 -10.840 < 2e-16 ***
## instrumentalness -2.2443174 0.4731338 -4.744 2.18e-06 ***
## key1 -0.1306867 0.1621132 -0.806 0.420214
## key2 -0.3754834 0.2006575 -1.871 0.061392 .
## key3 0.1823478 0.2244238 0.813 0.416551
## key4 -0.5438271 0.1589089 -3.422 0.000628 ***
## key5 -0.1190871 0.1542733 -0.772 0.440212
## key6 0.1383520 0.1610728 0.859 0.390433
## key7 -0.0235720 0.1765251 -0.134 0.893779
## key8 -0.1566102 0.1906708 -0.821 0.411495
## key9 -0.1520276 0.1634941 -0.930 0.352505
## key10 -0.2351475 0.1682632 -1.397 0.162353
## key11 -0.2909129 0.1502246 -1.937 0.052885 .
## loudness 0.4225379 0.0219162 19.280 < 2e-16 ***
## speechiness 2.1574574 0.5132766 4.203 2.70e-05 ***
## tempo -0.0007701 0.0013563 -0.568 0.570207
## valence -1.1893401 0.1933758 -6.150 8.60e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.874 on 3483 degrees of freedom
## Multiple R-squared: 0.1594, Adjusted R-squared: 0.1546
## F-statistic: 33.03 on 20 and 3483 DF, p-value: < 2.2e-16
The full Multiple Linear Regression model can be written \[ \hat {\sqrt {popularity}} = 12.803 - 0.010{Duration} - 0.142{Acousticness} - 0.138{Danceability} - 3.909{Energy} - 2.244{Instrumentalness} \\ - 0.131{Key1} - 0.375{Key2} + 0.182{Key3} - 0.544{Key4} - 0.119{Key5} + 0.138{Key6} - 0.024{Key7} \\ - 0.157{Key8} - 0.152{Key9} - 0.235{Key10} - 0.291{Key11} + 0.423{Loudness} + 2.157{Speechiness} \\ -0.001{Tempo} - 1.189{Valence} \]
The variables that have significant contribution in predicting the popularity score is Duration, Energy, Instrumentalness, Key4 (E Minor), Loudness, Speechiness, and Valence.
An increase in Duration, Acousticness, Danceability, Energy, Instrumentalness, Tempo, Valence and choosing Key1 (C#/D-flat Minor), Key2 (D Minor), Key4 (E Minor), Key5 (F Minor), Key7 (G Minor), Key8 (G#/A-flat Minor), Key9 (A Minor), Key10 (A#/B-flat Minor), Key11 (B minor) instead of Key0(Cminor) decreases the square root popularity score, on average. An increase in Loudness, Speechiness and choosing Key3 (D#/E-flat Minor) or Key6 (F#/G-flat Minor)over Key0 (C Minor) increases the square root popularity score, on average.
prediction on test data
# prediction on test data
yhat.mlr = predict(ml0.sqrt, newdata = test0 %>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.mlr <- yhat.mlr - test0$popularity^0.5
rmse.mlr <- sqrt(mean(error.mlr^2))
data.frame(
RMSE = RMSE(yhat.mlr, test0$popularity^0.5),
R2 = R2(yhat.mlr, test0$popularity^0.5)
)
## RMSE R2
## 1 1.863478 0.1332768
set.seed(123)
train_control <- trainControl(method = "cv",
number = 10)
bye <- capture.output(mlr.step_kcv <- train(popularity ~. , data = train0 %>% mutate(popularity = popularity^0.5),
method = "lmStepAIC", trControl = train_control))
print(mlr.step_kcv)
## Linear Regression with Stepwise Selection
##
## 3504 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 3153, 3152, 3154, 3154, 3154, 3154, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 1.878512 0.1528544 1.516643
summary(mlr.step_kcv$finalModel)
##
## Call:
## lm(formula = .outcome ~ duration + energy + instrumentalness +
## key2 + key3 + key4 + key6 + key11 + loudness + speechiness +
## valence, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9636 -1.2044 0.1791 1.3736 5.2248
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.4039589 0.3759972 32.989 < 2e-16 ***
## duration -0.0097167 0.0009949 -9.766 < 2e-16 ***
## energy -3.8007377 0.3157733 -12.036 < 2e-16 ***
## instrumentalness -2.2440721 0.4710858 -4.764 1.98e-06 ***
## key2 -0.2588154 0.1619102 -1.599 0.1100
## key3 0.3053371 0.1903552 1.604 0.1088
## key4 -0.4241428 0.1056953 -4.013 6.12e-05 ***
## key6 0.2617396 0.1088858 2.404 0.0163 *
## key11 -0.1689374 0.0921533 -1.833 0.0669 .
## loudness 0.4214141 0.0217796 19.349 < 2e-16 ***
## speechiness 2.0999791 0.5079769 4.134 3.65e-05 ***
## valence -1.2328499 0.1723401 -7.154 1.03e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.873 on 3492 degrees of freedom
## Multiple R-squared: 0.1586, Adjusted R-squared: 0.156
## F-statistic: 59.84 on 11 and 3492 DF, p-value: < 2.2e-16
The Stepwise 10 fold Cross Validation Multiple Linear Regression model can be written: \[ \hat {\sqrt {popularity}} = 12.404 - 0.010{Duration} - 3.801{Energy} - 2.244{Instrumentalness} - 0.259{Key2} + 0.305{Key3} - 0.424{Key4} \\ + 0.262{Key6} - 0.169{Key11} + 0.421{Loudness} + 2.100{Speechiness} - 1.189{Valence} \]
Compared to the full Multiple Linear Regression Model, the Stepwise method removes the variables: Acousticness, Danceability, Key1, Key5, Key7 - Key10, and Tempo.
The variables that have significant contribution in predicting the popularity score is Duration, Energy, Instrumentalness, Key4 (E Minor), Key6 (F#/G-flat Minor), Loudness, Speechiness, and Valence.
An increase in Duration, Energy, Instrumentalness, Valence and choosing, Key2 (D Minor), Key4 (E Minor), Key5 (F Minor), , Key11 (B minor) instead of Key0(Cminor) decreases the square root popularity score, on average. An increase in Loudness, Speechiness and choosing Key 3 (D#/E-flat Minor) or Key6 (F#/G-flat Minor) over Key0 (C Minor) increases the square root popularity score, on average.
prediction on test data
# prediction on test data
yhat.step_kcv = predict(mlr.step_kcv$finalModel, newdata=key.dummy(test0) %>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.step_kcv <- yhat.step_kcv - test0$popularity^0.5
rmse.step_kcv <- sqrt(mean(error.step_kcv^2))
data.frame(
RMSE = RMSE(yhat.step_kcv, test0$popularity^0.5),
R2 = R2(yhat.step_kcv, test0$popularity^0.5)
)
## RMSE R2
## 1 1.861179 0.1350903
Both of the models have RMSE scores of around 2. This isn't terrible since the range of the transformed popularity scores is 0-10.
lm.ridge0 <- train(popularity ~. , data = train0 %>% mutate(popularity = popularity^0.5), method = "glmnet",
trControl = train_control,
tuneGrid = expand.grid(alpha = 0,lambda = seq(0.001,0.1,by = 0.001)))
#Best Tuning Parameter
lm.ridge0$bestTune
## alpha lambda
## 47 0 0.047
# best coefficient
lm.ridge0.model <- coef(lm.ridge0$finalModel, lm.ridge0$bestTune$lambda)
lm.ridge0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 12.3411323068
## duration -0.0096041085
## acousticness -0.0662308488
## danceability -0.1493986374
## energy -3.5601398635
## instrumentalness -2.2691735701
## key1 -0.0831090967
## key2 -0.3241117452
## key3 0.2260133231
## key4 -0.4931505942
## key5 -0.0698858769
## key6 0.1837187046
## key7 0.0173666066
## key8 -0.1059775161
## key9 -0.1084064701
## key10 -0.1844827951
## key11 -0.2447391761
## loudness 0.3990852347
## speechiness 2.0404769494
## tempo -0.0007694543
## valence -1.1748730700
The Ridge Multiple Linear Regularized Regression model can be written \[ \hat {\sqrt {popularity}} = 12.341 - 0.010{Duration} - 0.066{Acousticness} - 0.149{Danceability} - 3.560{Energy} - 2.269{Instrumentalness} \\ - 0.083{Key1} - 0.324{Key2} + 0.226{Key3} - 0.493{Key4} - 0.070{Key5} + 0.184{Key6} + 0.017{Key7} \\ - 0.106{Key8} - 0.108{Key9} - 0.184{Key10} - 0.245{Key11} + 0.399{Loudness} + 2.040{Speechiness} \\ -0.001{Tempo} - 1.175{Valence} \]
An increase in Duration, Acousticness, Danceability, Energy, Instrumentalness, Tempo, Valence and choosing Key1 (C#/D-flat Minor), Key2 (D Minor), Key4 (E Minor), Key5 (F Minor), Key8 (G#/A-flat Minor), Key9 (A Minor), Key10 (A#/B-flat Minor), Key11 (B minor) instead of Key0(Cminor) decreases the square root of the popularity score, on average. An increase in Loudness, Speechiness and choosing Key3 (D#/E-flat Minor), Key6 (F#/G-flat Minor), or Key7 (G Minor)over Key0 (C Minor) increases the square root of the popularity score, on average.
In comparison to the Full MLR model, the ridge shrinks the coefficients closer to zero, and the coefficient of dummy variable Key7 flipped signs from having a negative coefficient to a positive coefficient.
# prediction on test data
yhat.ridge0 = predict(lm.ridge0, s=lm.ridge0$bestTune,newdata=test0%>% mutate(popularity = popularity^0.5))
# RMSE for test data
data.frame(
RMSE = RMSE(yhat.ridge0, test0$popularity^0.5),
R2 = R2(yhat.ridge0, test0$popularity^0.5)
)
## RMSE R2
## 1 1.861093 0.1339601
lm.lasso0 <- train(popularity ~. , data = train0 %>% mutate(popularity = popularity^0.5), method = "glmnet",
trControl = train_control,
tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001)))
#Best Tuning Parameter
lm.lasso0$bestTune
## alpha lambda
## 17 1 0.017
# best coefficient
lm.lasso0.model <- coef(lm.lasso0$finalModel, lm.lasso0$bestTune$lambda)
lm.lasso0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 11.926081916
## duration -0.009219647
## acousticness .
## danceability .
## energy -3.509181259
## instrumentalness -2.058245554
## key1 .
## key2 -0.161479016
## key3 0.216989939
## key4 -0.363371695
## key5 .
## key6 0.223544343
## key7 0.035935912
## key8 .
## key9 .
## key10 -0.044086831
## key11 -0.120863074
## loudness 0.399346845
## speechiness 1.775570123
## tempo .
## valence -1.140052989
The Lasso Multiple Linear Regularized Regression model can be written \[ \hat {\sqrt {popularity}} = 11.897 - 0.010{Duration} - 3.492{Energy} - 2.047{Instrumentalness} \\ - 0.155{Key2} + 0.212{Key3} - 0.359{Key4} + 0.221{Key6} - 0.033{Key7} - 0.040{Key10} \\ - 0.118{Key11} + 0.398{Loudness} + 1.755{Speechiness} - 1.135{Valence} \]
Compared to the full Multiple Linear Regression model, the Lasso model removes variables: Acousticness, Danceability, Key1, Key5, Key8, Key9, and Tempo
An increase in Duration, Energy, Instrumentalness, Valence and choosing Key2 (D Minor), Key4 (E Minor), Key7 (G Minor), Key10 (A#/B-flat Minor), Key11 (B minor) instead of Key0(Cminor) decreases the square root popularity score, on average. An increase in Loudness, Speechiness and choosing Key3 (D#/E-flat Minor) or Key6 (F#/G-flat Minor)over Key0 (C Minor) increases the square root popularity score, on average.
# prediction on test data
yhat.lasso0 = predict(lm.lasso0, s=lm.lasso0$bestTune,newdata=test0%>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.lasso0 <- yhat.lasso0 - test0$popularity^0.5
rmse.lasso0 <- sqrt(mean(error.lasso0^2))
data.frame(
RMSE = RMSE(yhat.lasso0, test0$popularity^0.5),
R2 = R2(yhat.lasso0, test0$popularity^0.5)
)
## RMSE R2
## 1 1.858953 0.1351553
lm.enet0 <- train(popularity ~. , data = train0 %>% mutate(popularity = popularity^0.5), method = "glmnet",
trControl = train_control,
tuneGrid = expand.grid(alpha = seq(0,1,by = 0.05),lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
lm.enet0$bestTune
## alpha lambda
## 2018 1 0.018
# best coefficient
lm.enet0.model <- coef(lm.enet0$finalModel, lm.enet0$bestTune$lambda)
lm.enet0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 11.897380956
## duration -0.009188124
## acousticness .
## danceability .
## energy -3.491742638
## instrumentalness -2.047453789
## key1 .
## key2 -0.155458931
## key3 0.212147017
## key4 -0.359461075
## key5 .
## key6 0.221662630
## key7 0.032645442
## key8 .
## key9 .
## key10 -0.039857479
## key11 -0.117656255
## loudness 0.398035935
## speechiness 1.754828098
## tempo .
## valence -1.135217969
The Elastic Net Multiple Linear Regression model can be written \[ \hat {\sqrt {popularity}} = 11.982 - 0.010{Duration} - 3.542{Energy} - 2.081{Instrumentalness} - 0.174{Key2} \\ + 0.227{Key3} - 0.371{Key4} + 0.227{Key6}+ 0.043{Key7} - 0.053{Key10} - 0.127{Key11} \\ + 0.402{Loudness} + 1.817{Speechiness}-0.000001{Tempo} - 1.150{Valence} \]
Compare to the Full Multiple linear regression model, the Elastic Net regression removes the variables: Acousticness, Danceability, Key1, Key5, Key8, and Key9. The Key7 Dummy variable also has a positive coefficient rather than the negative coefficient in the full model.
An increase in Duration, Energy, Instrumentalness, Tempo, Valence and choosing Key2 (D Minor), Key4 (E Minor), Key10 (A#/B-flat Minor), Key11 (B minor) instead of Key0(Cminor) decreases the square root popularity score, on average. An increase in Loudness, Speechiness and choosing Key3 (D#/E-flat Minor), Key6 (F#/G-flat Minor) or Key7 (G Minor) instead of Key0 (C Minor) increases the square root popularity score, on average.
# prediction on test data
yhat.enet0 = predict(lm.enet0, s=lm.enet0$bestTune,newdata=test0%>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.enet0 <- yhat.enet0 - test0$popularity^0.5
rmse.enet0 <- sqrt(mean(error.enet0^2))
data.frame(
RMSE = RMSE(yhat.enet0, test0$popularity^0.5),
R2 = R2(yhat.enet0, test0$popularity^0.5)
)
## RMSE R2
## 1 1.858834 0.1351914
mlr0.results <- data.frame(Model = c("Full", "Step 10CV", "Ridge 10CV", "Lasso 10CV", "ElasticNet 10CV"),
#AIC = c(),
adjR2 = c(0.1546, 0.1560, NA, NA,NA),
RMSE = c(1.8635, 1.8612,1.8611, 1.8588, 1.8592),
Pred.R2 = c( 0.1333, 0.1351, 0.1340, 0.1352, 0.1351))
mlr0.results
## Model adjR2 RMSE Pred.R2
## 1 Full 0.1546 1.8635 0.1333
## 2 Step 10CV 0.1560 1.8612 0.1351
## 3 Ridge 10CV NA 1.8611 0.1340
## 4 Lasso 10CV NA 1.8588 0.1352
## 5 ElasticNet 10CV NA 1.8592 0.1351
The model that yields the lowest RMSE and highest R2 from predictions is the Lasso Multiple Linear Regularized Regression model.
The Lasso Multiple Linear Regularized Regression model can be written \[ \hat {\sqrt {popularity}} = 11.897 - 0.010{Duration} - 3.492{Energy} - 2.047{Instrumentalness} \\ - 0.155{Key2} + 0.212{Key3} - 0.359{Key4} + 0.221{Key6} - 0.033{Key7} - 0.040{Key10} \\ - 0.118{Key11} + 0.398{Loudness} + 1.755{Speechiness} - 1.135{Valence} \]
ml1 <- lm(popularity ~. , data = train1)
plot(ml1)
ml1.sqrt <- lm(popularity ~. , data = train1 %>% mutate(popularity = popularity^0.5))
plot(ml1.sqrt)
summary(ml1.sqrt)
##
## Call:
## lm(formula = popularity ~ ., data = train1 %>% mutate(popularity = popularity^0.5))
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4223 -1.2067 0.1313 1.3255 5.2394
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.8571767 0.4012433 27.059 < 2e-16 ***
## duration -0.0092489 0.0007447 -12.419 < 2e-16 ***
## acousticness 0.0805595 0.1482707 0.543 0.586927
## danceability 0.0508851 0.2545136 0.200 0.841542
## energy -3.0777912 0.2809079 -10.957 < 2e-16 ***
## instrumentalness -1.8772989 0.3529644 -5.319 1.09e-07 ***
## key1 0.3212935 0.0916230 3.507 0.000457 ***
## key2 0.2071664 0.0980728 2.112 0.034699 *
## key3 0.4306126 0.1503977 2.863 0.004210 **
## key4 -0.0291157 0.1272720 -0.229 0.819058
## key5 0.2042781 0.1117088 1.829 0.067504 .
## key6 0.2169456 0.1092982 1.985 0.047206 *
## key7 0.2832696 0.0911450 3.108 0.001894 **
## key8 0.2970985 0.1057360 2.810 0.004975 **
## key9 0.2708845 0.1134637 2.387 0.017001 *
## key10 0.0800665 0.1321708 0.606 0.544685
## key11 0.3899069 0.1176778 3.313 0.000928 ***
## loudness 0.3862563 0.0173867 22.216 < 2e-16 ***
## speechiness 4.5798143 0.4149651 11.037 < 2e-16 ***
## tempo -0.0014736 0.0009580 -1.538 0.124055
## valence -0.8492855 0.1545529 -5.495 4.08e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.821 on 5483 degrees of freedom
## Multiple R-squared: 0.1499, Adjusted R-squared: 0.1468
## F-statistic: 48.35 on 20 and 5483 DF, p-value: < 2.2e-16
The full Multiple Linear Regression model can be written \[ \hat {\sqrt {popularity}} = 10.857 - 0.010{Duration} + 0.081{Acousticness} + 0.051{Danceability} - 3.078{Energy} - 1.877{Instrumentalness} \\ + 0.321{Key1} + 0.207{Key2} + 0.431{Key3} - 0.029{Key4} + 0.204{Key5} + 0.217{Key6} + 0.283{Key7} \\ + 0.297{Key8} + 0.271{Key9} + 0.080{Key10} + 0.390{Key11} + 0.386{Loudness} + 4.580{Speechiness} \\ -0.001{Tempo} - 0.849{Valence} \]
The variables that have significant contribution in predicting the popularity score is Duration, Energy, Instrumentalness, Key1 (C#/D-flat Major), Key2, Key3, Key6, Key7, Key8, Key9, Key11, Loudness, Speechiness, and Valence.
An increase in Duration, Energy, Instrumentalness, Tempo, Valence and choosing Key4 (E Major) instead of Key0(C Major) decreases the square root popularity score, on average. An increase in Acousticness, Danceability, Loudness, Speechiness and choosing Key1 (C#/D-flat Major), Key2 (D Major), Key3 (D#/E-flat Major), Key5 (F Major), Key6 (F#/G-flat Minor), Key7 (G Major), Key8 (G#/A-flat Major), Key9 (A Major), Key10 (A#/B-flat Major),or Key11 (B Major) over Key0 (C Major) increases the square root popularity score, on average.
prediction on test data
# prediction on test data
yhat.mlr = predict(ml1.sqrt, newdata = test1 %>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.mlr <- yhat.mlr - test1$popularity^0.5
rmse.mlr <- sqrt(mean(error.mlr^2))
data.frame(
RMSE = RMSE(yhat.mlr, test1$popularity^0.5),
R2 = R2(yhat.mlr, test1$popularity^0.5)
)
## RMSE R2
## 1 1.858473 0.1257546
set.seed(123)
train_control <- trainControl(method = "cv",
number = 10)
bye <- capture.output(mlr.step_kcv <- train(popularity ~. , data = train1 %>% mutate(popularity = popularity^0.5),
method = "lmStepAIC", trControl = train_control))
print(mlr.step_kcv)
## Linear Regression with Stepwise Selection
##
## 5504 samples
## 10 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 4953, 4954, 4954, 4955, 4953, 4953, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 1.823965 0.1452973 1.468792
summary(mlr.step_kcv$finalModel)
##
## Call:
## lm(formula = .outcome ~ duration + energy + instrumentalness +
## key1 + key2 + key3 + key5 + key6 + key7 + key8 + key9 + key11 +
## loudness + speechiness + tempo + valence, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4396 -1.2039 0.1341 1.3270 5.2434
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.9848032 0.3075276 35.720 < 2e-16 ***
## duration -0.0092575 0.0007364 -12.571 < 2e-16 ***
## energy -3.1605194 0.2354845 -13.421 < 2e-16 ***
## instrumentalness -1.8844623 0.3524459 -5.347 9.31e-08 ***
## key1 0.3118350 0.0840432 3.710 0.000209 ***
## key2 0.1989538 0.0909328 2.188 0.028717 *
## key3 0.4250833 0.1449895 2.932 0.003384 **
## key5 0.1963190 0.1054161 1.862 0.062611 .
## key6 0.2098630 0.1030213 2.037 0.041690 *
## key7 0.2745519 0.0835910 3.284 0.001028 **
## key8 0.2885193 0.0992487 2.907 0.003663 **
## key9 0.2616042 0.1071561 2.441 0.014664 *
## key11 0.3831543 0.1116079 3.433 0.000601 ***
## loudness 0.3871730 0.0172524 22.442 < 2e-16 ***
## speechiness 4.5787531 0.4146235 11.043 < 2e-16 ***
## tempo -0.0015353 0.0009202 -1.668 0.095302 .
## valence -0.8365555 0.1361062 -6.146 8.49e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.82 on 5487 degrees of freedom
## Multiple R-squared: 0.1498, Adjusted R-squared: 0.1473
## F-statistic: 60.42 on 16 and 5487 DF, p-value: < 2.2e-16
The Stepwise Multiple Linear Regression model can be written: \[ \hat {\sqrt {popularity}} = 10.985 - 0.010{Duration} - 3.161{Energy} - 1.884{Instrumentalness} \\ + 0.312{Key1} + 0.199{Key2} + 0.425{Key3} + 0.196{Key5} + 0.210{Key6} + 0.275{Key7} \\ + 0.289{Key8} + 0.262{Key9} + 0.383{Key11} + 0.387{Loudness} + 4.579{Speechiness} \\ -0.002{Tempo} - 0.837{Valence} \]
Compared to the full multiple linear regression model, the stepwise model removes the variables Acousticness, Danceability, Key4, Key10
The variables that have significant contribution in predicting the popularity score is Duration, Energy, Instrumentalness, Key1 (C#/D-flat Major), Key2, Key3, Key6, Key7, Key8, Key9, Key11, Loudness, Speechiness, and Valence.
An increase in Duration, Energy, Instrumentalness, Tempo, and Valence decreases the square root popularity score, on average. An increase in Acousticness, Danceability, Loudness, Speechiness and choosing Key1 (C#/D-flat Major), Key2 (D Major), Key3 (D#/E-flat Major), Key5 (F Major), Key6 (F#/G-flat Minor), Key7 (G Major), Key8 (G#/A-flat Major), Key9 (A Major), or Key11 (B Major) over Key0 (C Major) increases the square root popularity score, on average.
prediction on test data
# prediction on test data
yhat.step_kcv = predict(mlr.step_kcv$finalModel, newdata=key.dummy(test1) %>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.step_kcv <- yhat.step_kcv - test1$popularity^0.5
rmse.step_kcv <- sqrt(mean(error.step_kcv^2))
data.frame(
RMSE = RMSE(yhat.step_kcv, test1$popularity^0.5),
R2 = R2(yhat.step_kcv, test1$popularity^0.5)
)
## RMSE R2
## 1 1.858518 0.1257011
lm.ridge1 <- train(popularity ~. , data = train1 %>% mutate(popularity = popularity^0.5), method = "glmnet",
trControl = train_control,
tuneGrid = expand.grid(alpha = 0,lambda = seq(0.001,0.1,by = 0.001)))
#Best Tuning Parameter
lm.ridge1$bestTune
## alpha lambda
## 45 0 0.045
# best coefficient
lm.ridge1.model <- coef(lm.ridge1$finalModel, lm.ridge1$bestTune$lambda)
lm.ridge1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 10.347839763
## duration -0.009063271
## acousticness 0.169898922
## danceability 0.049633951
## energy -2.620990536
## instrumentalness -1.927744817
## key1 0.293618095
## key2 0.182685249
## key3 0.399431145
## key4 -0.049409156
## key5 0.179350934
## key6 0.188771865
## key7 0.259698734
## key8 0.273385099
## key9 0.249069582
## key10 0.051821098
## key11 0.361138772
## loudness 0.355806469
## speechiness 4.325244621
## tempo -0.001381351
## valence -0.858954968
The Ridge Multiple Linear Regularized Regression model can be written:
\[
\hat {\sqrt {popularity}} = 10.348 - 0.010{Duration} + 0.170{Acousticness} + 0.050{Danceability} - 2.621{Energy} - 1.928{Instrumentalness}
\\
+ 0.294{Key1} + 0.183Key2} + 0.399{Key3} - 0.049{Key4} + 0.179{Key5} + 0.189{Key6} + 0.260{Key7}
\\
+ 0.273{Key8} + 0.249{Key9} + 0.052{Key10} + 0.361{Key11} + 0.356{Loudness} + 4.325{Speechiness}
\\
-0.001{Tempo} - 0.859{Valence}
\]
An increase in Duration, Energy, Instrumentalness, Tempo, Valence and choosing Key4 (E Major) instead of Key0(C Major) decreases the square root popularity score, on average. An increase in Acousticness, Danceability, Loudness, Speechiness and choosing Key1 (C#/D-flat Major), Key2 (D Major), Key3 (D#/E-flat Major), Key5 (F Major), Key6 (F#/G-flat Minor), Key7 (G Major), Key8 (G#/A-flat Major), Key9 (A Major), Key10 (A#/B-flat Major),or Key11 (B Major) over Key0 (C Major) increases the square root popularity score, on average.
# prediction on test data
yhat.ridge1 = predict(lm.ridge1, s=lm.ridge1$bestTune,newdata=test1%>% mutate(popularity = popularity^0.5))
# RMSE for test data
data.frame(
RMSE = RMSE(yhat.ridge1, test1$popularity^0.5),
R2 = R2(yhat.ridge1, test1$popularity^0.5)
)
## RMSE R2
## 1 1.858182 0.1253763
lm.lasso1 <- train(popularity ~. , data = train1 %>% mutate(popularity = popularity^0.5), method = "glmnet",
trControl = train_control,
tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001)))
#Best Tuning Parameter
lm.lasso1$bestTune
## alpha lambda
## 3 1 0.003
# best coefficient
lm.lasso1.model <- coef(lm.lasso1$finalModel, lm.lasso1$bestTune$lambda)
lm.lasso1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 10.823106069
## duration -0.009165071
## acousticness 0.078797866
## danceability 0.007432375
## energy -3.012459438
## instrumentalness -1.855427319
## key1 0.268514415
## key2 0.153367880
## key3 0.368124062
## key4 -0.060812240
## key5 0.148197878
## key6 0.160460653
## key7 0.231766196
## key8 0.242516531
## key9 0.215811156
## key10 0.020377237
## key11 0.334220735
## loudness 0.381054835
## speechiness 4.504126435
## tempo -0.001394241
## valence -0.829567095
The Lasso Multiple Linear Regression model can be written \[ \hat {\sqrt {popularity}} = 10.823 - 0.010{Duration} + 0.079{Acousticness} + 0.007{Danceability} - 3.012{Energy} - 1.855{Instrumentalness} \\ + 0.269{Key1} + 0.153{Key2} + 0.368{Key3} - 0.061{Key4} + 0.148{Key5} + 0.160{Key6} + 0.232{Key7} \\ + 0.243{Key8} + 0.216{Key9} + 0.020{Key10} + 0.334{Key11} + 0.381{Loudness} + 4.504{Speechiness} \\ -0.001{Tempo} - 0.830{Valence} \]
Compared to the full multiple linear regression model, all variables are kept in the model with lasso, but the coefficients are shrunk closer to zero.
An increase in Duration, Energy, Instrumentalness, Tempo, Valence and choosing Key4 (E Major) instead of Key0 (C Major) decreases the square root popularity score, on average. An increase in Acousticness, Danceability, Loudness, Speechiness and choosing Key1 (C#/D-flat Major), Key2 (D Major), Key3 (D#/E-flat Major), Key5 (F Major), Key6 (F#/G-flat Minor), Key7 (G Major), Key8 (G#/A-flat Major), Key9 (A Major), Key10 (A#/B-flat Major),or Key11 (B Major) over Key0 (C Major) increases the square root popularity score, on average.
# prediction on test data
yhat.lasso1 = predict(lm.lasso1, s=lm.lasso1$bestTune,newdata=test1%>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.lasso1 <- yhat.lasso1 - test1$popularity^0.5
rmse.lasso1 <- sqrt(mean(error.lasso1^2))
data.frame(
RMSE = RMSE(yhat.lasso1, test1$popularity^0.5),
R2 = R2(yhat.lasso1, test1$popularity^0.5)
)
## RMSE R2
## 1 1.858393 0.1255584
lm.enet1 <- train(popularity ~. , data = train1 %>% mutate(popularity = popularity^0.5), method = "glmnet",
trControl = train_control,
tuneGrid = expand.grid(alpha = seq(0,1,by = 0.05),lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
lm.enet1$bestTune
## alpha lambda
## 507 0.25 0.007
# best coefficient
lm.enet1.model <- coef(lm.enet1$finalModel, lm.enet1$bestTune$lambda)
lm.enet1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 10.770793937
## duration -0.009180508
## acousticness 0.092081531
## danceability 0.026587503
## energy -2.978939887
## instrumentalness -1.872237837
## key1 0.287645952
## key2 0.173244638
## key3 0.390821662
## key4 -0.049747724
## key5 0.169007388
## key6 0.181032607
## key7 0.250833300
## key8 0.262900268
## key9 0.236604652
## key10 0.042176217
## key11 0.354397091
## loudness 0.379348713
## speechiness 4.503674675
## tempo -0.001415713
## valence -0.840584039
The Elastic Net Multiple Linear Regression model can be written \[ \hat {\sqrt {popularity}} = 10.771 - 0.010{Duration} + 0.092{Acousticness} + 0.027{Danceability} - 2.979{Energy} - 1.872{Instrumentalness} \\ + 0.288{Key1} + 0.173{Key2} + 0.391{Key3} - 0.050{Key4} + 0.169{Key5} + 0.181{Key6} + 0.251{Key7} \\ + 0.263{Key8} + 0.237{Key9} + 0.042{Key10} + 0.354{Key11} + 0.379{Loudness} + 4.504{Speechiness} \\ -0.001{Tempo} - 0.841{Valence} \]
An increase in Duration, Energy, Instrumentalness, Tempo, Valence and choosing Key4 (E Major) instead of Key0(C Major) decreases the square root popularity score, on average. An increase in Acousticness, Danceability, Loudness, Speechiness and choosing Key1 (C#/D-flat Major), Key2 (D Major), Key3 (D#/E-flat Major), Key5 (F Major), Key6 (F#/G-flat Minor), Key7 (G Major), Key8 (G#/A-flat Major), Key9 (A Major), Key10 (A#/B-flat Major),or Key11 (B Major) over Key0 (C Major) increases the square root popularity score, on average.
# prediction on test data
yhat.enet1 = predict(lm.enet1, s=lm.enet1$bestTune,newdata=test1%>% mutate(popularity = popularity^0.5))
# RMSE for test data
error.enet1 <- yhat.enet1 - test1$popularity^0.5
rmse.enet1 <- sqrt(mean(error.enet1^2))
data.frame(
RMSE = RMSE(yhat.enet1, test1$popularity^0.5),
R2 = R2(yhat.enet1, test1$popularity^0.5)
)
## RMSE R2
## 1 1.858296 0.1256477
mlr1.results <- data.frame(Model = c("Full", "Step 10CV", "Ridge 10CV", "Lasso 10CV", "ElasticNet 10CV"),
#AIC = c(),
adjR2 = c(0.1468, 0.1453, NA, NA, NA),
RMSE = c(1.8585, 1.8585, 1.8582, 1.8584, 1.8583),
Pred.R2 = c(0.1258, 0.1257, 0.1254, 0.1256, 0.1256))
mlr1.results
## Model adjR2 RMSE Pred.R2
## 1 Full 0.1468 1.8585 0.1258
## 2 Step 10CV 0.1453 1.8585 0.1257
## 3 Ridge 10CV NA 1.8582 0.1254
## 4 Lasso 10CV NA 1.8584 0.1256
## 5 ElasticNet 10CV NA 1.8583 0.1256
The model with the lowest RMSE and highest R2...
Use 70% train, 15% validation, 15% test, to use validation for finding optimal cutoff value.
kpop.logit <- select(data, popular, duration, acousticness, danceability, energy, instrumentalness, key, loudness, mode, speechiness, tempo, valence)
logit.kpop0 <- kpop.logit %>% filter(mode == 0)%>% select(-mode)
logit.kpop1 <- kpop.logit %>% filter(mode == 1) %>% select(-mode)
### Kpop mode 0 train and test
# smpl.size0 <- floor(0.75*nrow(logit.kpop0))
# set.seed(123)
# smpl0 <- sample(nrow(logit.kpop0), smpl.size0, replace = FALSE)
# og.logit.train0 <- logit.kpop0[smpl0,]
# og.logit.test0 <- logit.kpop0[-smpl0,]
set.seed(123)
p3 <- partition.3(logit.kpop0, 0.70, 0.15)
logit.train0 <- p3$data.train
logit.valid0 <- p3$data.val
logit.test0 <- p3$data.test
all.train0 <- rbind(logit.train0, logit.valid0)
# ### Kpop mode 1 train and test
# smpl.size1 <- floor(0.75*nrow(logit.kpop1))
# set.seed(123)
# smpl1 <- sample(nrow(logit.kpop1), smpl.size1, replace = FALSE)
# logit.train1 <- logit.kpop1[smpl1,]
# logit.test1 <- logit.kpop1[-smpl1,]
set.seed(123)
p3 <- partition.3(logit.kpop1, 0.70, 0.15)
logit.train1 <- p3$data.train
logit.valid1 <- p3$data.val
logit.test1 <- p3$data.test
all.train1 <- rbind(logit.train1, logit.valid1)
fitting logistic model using combo of train/valid/test, finding optimal model using training data.
# Fit logistic model on training data
v.logit.model0 <- glm(popular ~ ., family=binomial(link='logit'),data= logit.train0)
#search for best cutoff
out0 <- opt.cut.func(v.logit.model0, logit.valid0)
opt.cut.plot(out0)
out0$cutoff[which.min(out0$ssdiff.vec)]
## [1] 0.14
v.opt.cut0 <- out0$cutoff[which.max(out0$kappa.vec)]
v.opt.cut0
## [1] 0.18
Fit final model (combo of train and validation)
v.model.final <- glm(popular ~ ., data=all.train0, family=binomial(link='logit'))
summary(v.model.final)
##
## Call:
## glm(formula = popular ~ ., family = binomial(link = "logit"),
## data = all.train0)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4700 -0.5782 -0.4585 -0.3339 2.6461
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.0594524 0.8743667 5.786 7.19e-09 ***
## duration -0.0086620 0.0016174 -5.355 8.54e-08 ***
## acousticness -1.4718150 0.3559297 -4.135 3.55e-05 ***
## danceability 0.3111716 0.5329309 0.584 0.5593
## energy -3.7232087 0.5626631 -6.617 3.66e-11 ***
## instrumentalness -2.8085598 1.8548502 -1.514 0.1300
## key1 0.1109165 0.2538918 0.437 0.6622
## key2 -0.0875524 0.3211703 -0.273 0.7852
## key3 0.7722710 0.3100048 2.491 0.0127 *
## key4 -0.3270021 0.2657016 -1.231 0.2184
## key5 0.1797736 0.2418359 0.743 0.4573
## key6 0.3135429 0.2457043 1.276 0.2019
## key7 0.1753454 0.2759348 0.635 0.5251
## key8 0.2184072 0.2954494 0.739 0.4598
## key9 0.0318705 0.2590589 0.123 0.9021
## key10 0.1516816 0.2613415 0.580 0.5616
## key11 0.0590538 0.2393927 0.247 0.8052
## loudness 0.3497156 0.0428642 8.159 3.39e-16 ***
## speechiness 3.4530879 0.6920940 4.989 6.06e-07 ***
## tempo -0.0008802 0.0020104 -0.438 0.6615
## valence -1.6385761 0.2894790 -5.660 1.51e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3102.1 on 3971 degrees of freedom
## Residual deviance: 2903.6 on 3951 degrees of freedom
## AIC: 2945.6
##
## Number of Fisher Scoring iterations: 6
Full Logistic Model \[ log(\frac{\hat \pi}{1 - \hat \pi}) = 5.059 - 0.009{Duration} - 1.472{Acousticness} + 0.311{Danceability} - 3.723{Energy} -2.809{Instrumentalness} \\ + 0.111{Key1} - 0.088{Key2} + 0.772{Key3} - 0.327{Key4} + 0.180{Key5} + 0.314{Key6} + 0.175{Key7} + 0.218{Key8} + 0.032{Key9} \\ \\ \\ + 0.152{Key10} + 0.059{Key11} + 0.350{Loudness} + 3.453{Speechiness} - 0.001{Tempo} -1.639{Valence} \]
Significant variables are the Duration, Acousticness, Energy, Loudness, Speechiness, and Valence and having Key3 (D#/E-flat Minor) instead of Key0 (C Minor).
predict on test using 0.5 cutoff
v.prob.test <- predict(v.logit.model0, newdata = logit.test0, type = "response")
v.pred.test <- ifelse(v.prob.test > 0.5, 1, 0) # using cutoff = 0.15
confusionMatrix(as.factor(v.pred.test), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 606 93
## 1 1 1
##
## Accuracy : 0.8659
## 95% CI : (0.8384, 0.8903)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 0.5275
##
## Kappa : 0.0153
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.010638
## Specificity : 0.998353
## Pos Pred Value : 0.500000
## Neg Pred Value : 0.866953
## Prevalence : 0.134094
## Detection Rate : 0.001427
## Detection Prevalence : 0.002853
## Balanced Accuracy : 0.504495
##
## 'Positive' Class : 1
##
predict on optimal cutoff (0.18) for highest kappa
v.prob.test <- predict(v.logit.model0, newdata = logit.test0, type = "response")
v.pred.test <- ifelse(v.prob.test > v.opt.cut0, 1, 0) # using cutoff = 0.18
confusionMatrix(as.factor(v.pred.test), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 482 67
## 1 125 27
##
## Accuracy : 0.7261
## 95% CI : (0.6915, 0.7588)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0645
##
## Mcnemar's Test P-Value : 3.895e-05
##
## Sensitivity : 0.28723
## Specificity : 0.79407
## Pos Pred Value : 0.17763
## Neg Pred Value : 0.87796
## Prevalence : 0.13409
## Detection Rate : 0.03852
## Detection Prevalence : 0.21683
## Balanced Accuracy : 0.54065
##
## 'Positive' Class : 1
##
Predict on optimal balance between sensitivity and specificity (0.14)
v.prob.test <- predict(v.logit.model0, newdata = logit.test0, type = "response")
v.pred.test <- ifelse(v.prob.test > 0.14, 1, 0) # using cutoff = 0.14
confusionMatrix(as.factor(v.pred.test), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 383 42
## 1 224 52
##
## Accuracy : 0.6205
## 95% CI : (0.5835, 0.6566)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1013
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.55319
## Specificity : 0.63097
## Pos Pred Value : 0.18841
## Neg Pred Value : 0.90118
## Prevalence : 0.13409
## Detection Rate : 0.07418
## Detection Prevalence : 0.39372
## Balanced Accuracy : 0.59208
##
## 'Positive' Class : 1
##
step-wise 10 fold cross validation
set.seed(123)
train_control <- trainControl(method = "cv",
number = 10)
# Fit K-fold CV model
meh <- capture.output(step0_kcv <- train(popular ~ ., data = logit.train0, family = binomial(),
method = "glmStepAIC", trControl = train_control, verbose = FALSE))
step0_kcv <- step0_kcv$finalModel
step0_kcv
##
## Call: NULL
##
## Coefficients:
## (Intercept) duration acousticness energy
## 5.200722 -0.008548 -1.448020 -3.736311
## instrumentalness key3 key4 loudness
## -2.393322 0.732785 -0.373695 0.339930
## speechiness valence
## 3.171613 -1.494016
##
## Degrees of Freedom: 3270 Total (i.e. Null); 3261 Residual
## Null Deviance: 2609
## Residual Deviance: 2453 AIC: 2473
summary(step0_kcv)
##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3942 -0.5881 -0.4721 -0.3445 2.5732
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.200722 0.738641 7.041 1.91e-12 ***
## duration -0.008548 0.001760 -4.858 1.19e-06 ***
## acousticness -1.448020 0.374716 -3.864 0.000111 ***
## energy -3.736311 0.600575 -6.221 4.93e-10 ***
## instrumentalness -2.393322 1.819973 -1.315 0.188499
## key3 0.732785 0.253987 2.885 0.003913 **
## key4 -0.373695 0.190967 -1.957 0.050365 .
## loudness 0.339930 0.045786 7.424 1.13e-13 ***
## speechiness 3.171613 0.744682 4.259 2.05e-05 ***
## valence -1.494016 0.279311 -5.349 8.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2609.2 on 3270 degrees of freedom
## Residual deviance: 2452.9 on 3261 degrees of freedom
## AIC: 2472.9
##
## Number of Fisher Scoring iterations: 6
kcv.prob.test0 <- predict(step0_kcv, newdata = key.dummy(logit.test0), type = "response")
kcv.pred.test0 <- ifelse(kcv.prob.test0 > 0.5, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(kcv.pred.test0), as.factor(logit.test0$popular), positive = "1")
Find optimal cut off value:
#search for best cutoff
kcv.out0 <- opt.cut.func(step0_kcv, key.dummy(logit.valid0))
opt.cut.plot(kcv.out0)
kcv.out0$cutoff[which.min(kcv.out0$ssdiff.vec)]
## [1] 0.14
kcv.out0.cut0 <- kcv.out0$cutoff[which.max(kcv.out0$kappa.vec)]
kcv.out0.cut0
## [1] 0.17
Fit final model (combo of train and validation)
set.seed(123)
finalmeh <- capture.output(step0_kcv.final0 <- train(popular ~ ., data = all.train0, family = binomial(),
method = "glmStepAIC", trControl = train_control, verbose = FALSE))
step0_kcv.final0 <- step0_kcv.final0$finalModel
step0_kcv.final0
##
## Call: NULL
##
## Coefficients:
## (Intercept) duration acousticness energy
## 5.37671 -0.00876 -1.52562 -3.86033
## instrumentalness key3 key4 loudness
## -2.73902 0.63876 -0.44345 0.35180
## speechiness valence
## 3.32807 -1.52259
##
## Degrees of Freedom: 3971 Total (i.e. Null); 3962 Residual
## Null Deviance: 3102
## Residual Deviance: 2908 AIC: 2928
summary(step0_kcv.final0)
##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4190 -0.5794 -0.4622 -0.3346 2.6346
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.376709 0.682435 7.879 3.31e-15 ***
## duration -0.008760 0.001591 -5.506 3.67e-08 ***
## acousticness -1.525616 0.345842 -4.411 1.03e-05 ***
## energy -3.860331 0.556026 -6.943 3.85e-12 ***
## instrumentalness -2.739017 1.838644 -1.490 0.13630
## key3 0.638761 0.240138 2.660 0.00781 **
## key4 -0.443445 0.179981 -2.464 0.01375 *
## loudness 0.351801 0.042604 8.258 < 2e-16 ***
## speechiness 3.328066 0.678889 4.902 9.48e-07 ***
## valence -1.522593 0.257624 -5.910 3.42e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3102.1 on 3971 degrees of freedom
## Residual deviance: 2908.3 on 3962 degrees of freedom
## AIC: 2928.3
##
## Number of Fisher Scoring iterations: 6
Stepwise 10 Fold Cross Validation:
Compared to the full logistic regression model, the stepwise model leaves out Danceability, Key1, Key2, Key5 - Key11, and Tempo.
\[ log(\frac{\hat \pi}{1 - \hat \pi}) = 5.377 - 0.009{Duration} - 1.525{Acousticness} - 3.860{Energy} - 2.739{Instrumentalness} \\ + 0.639{Key3} - 0.443{Key4} + 0.351{Loudness} + 3.328{Speechiness} - 1.522{Valence} \]
The Significant variables in the model are Duration, Acousticness, Energy, Key3 (D#/E-flat Minor), Key4 (E Minor) Loudness, Speechiness and Valence.
predict on test using 0.5 cutoff
kcv.prob.test0 <- predict(step0_kcv.final0, newdata = key.dummy(logit.test0), type = "response")
kcv.pred.test0 <- ifelse(kcv.prob.test0 > 0.5, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(kcv.pred.test0), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 606 93
## 1 1 1
##
## Accuracy : 0.8659
## 95% CI : (0.8384, 0.8903)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 0.5275
##
## Kappa : 0.0153
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.010638
## Specificity : 0.998353
## Pos Pred Value : 0.500000
## Neg Pred Value : 0.866953
## Prevalence : 0.134094
## Detection Rate : 0.001427
## Detection Prevalence : 0.002853
## Balanced Accuracy : 0.504495
##
## 'Positive' Class : 1
##
predict on test using optimal kappa, 0.17 cutoff
kcv.prob.test0 <- predict(step0_kcv.final0, newdata = key.dummy(logit.test0), type = "response")
kcv.pred.test0 <- ifelse(kcv.prob.test0 > 0.17, 1, 0) # using cutoff = 0.17
confusionMatrix(as.factor(kcv.pred.test0), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 475 65
## 1 132 29
##
## Accuracy : 0.719
## 95% CI : (0.6841, 0.752)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.07
##
## Mcnemar's Test P-Value : 2.572e-06
##
## Sensitivity : 0.30851
## Specificity : 0.78254
## Pos Pred Value : 0.18012
## Neg Pred Value : 0.87963
## Prevalence : 0.13409
## Detection Rate : 0.04137
## Detection Prevalence : 0.22967
## Balanced Accuracy : 0.54552
##
## 'Positive' Class : 1
##
predict on test using 0.14 cutoff, optimal balance between sensitivity and specificity
kcv.prob.test0 <- predict(step0_kcv.final0, newdata = key.dummy(logit.test0), type = "response")
kcv.pred.test0 <- ifelse(kcv.prob.test0 > 0.14, 1, 0) # using cutoff = 0.16
confusionMatrix(as.factor(kcv.pred.test0), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 397 45
## 1 210 49
##
## Accuracy : 0.6362
## 95% CI : (0.5994, 0.6719)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1007
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.5213
## Specificity : 0.6540
## Pos Pred Value : 0.1892
## Neg Pred Value : 0.8982
## Prevalence : 0.1341
## Detection Rate : 0.0699
## Detection Prevalence : 0.3695
## Balanced Accuracy : 0.5877
##
## 'Positive' Class : 1
##
set.seed(123)
glmulti.out0 <- glmulti(popular ~ ., data = logit.train0,
level = 1, # No interaction considered
method = "h", # Exhaustive approach
crit = "aic", # AIC as criteria
confsetsize = 5, # Keep 5 best models
plotty = F, report = F, # No plot or interim reports
fitfunction = "glm", # lm function
family = binomial) # binomial to run logistic regression
#glmulti.out0@formulas
view summary of top model
summary(glmulti.out0@objects[[1]])
##
## Call:
## fitfunc(formula = as.formula(x), family = ..1, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3900 -0.5880 -0.4766 -0.3530 2.4475
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.222133 0.734876 7.106 1.19e-12 ***
## duration -0.008472 0.001757 -4.822 1.42e-06 ***
## acousticness -1.456307 0.374863 -3.885 0.000102 ***
## energy -3.820021 0.597550 -6.393 1.63e-10 ***
## instrumentalness -2.385559 1.826735 -1.306 0.191582
## loudness 0.343749 0.045699 7.522 5.39e-14 ***
## speechiness 3.090051 0.744032 4.153 3.28e-05 ***
## valence -1.418107 0.277458 -5.111 3.20e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2609.2 on 3270 degrees of freedom
## Residual deviance: 2465.1 on 3263 degrees of freedom
## AIC: 2481.1
##
## Number of Fisher Scoring iterations: 6
Store model
allreg.logit0 <- glmulti.out0@objects[[1]]
#search for best cutoff
allreg.out0 <- opt.cut.func(allreg.logit0, logit.valid0)
opt.cut.plot(allreg.out0)
allreg.out0$cutoff[which.min(allreg.out0$ssdiff.vec)]
## [1] 0.15
allreg.out0.cut0 <- allreg.out0$cutoff[which.max(allreg.out0$kappa.vec)]
allreg.out0.cut0
## [1] 0.14
fit final model to combo of training and validation
set.seed(123)
glmulti.out0 <- glmulti(popular ~ ., data = all.train0,
level = 1, # No interaction considered
method = "h", # Exhaustive approach
crit = "aic", # AIC as criteria
confsetsize = 5, # Keep 5 best models
plotty = F, report = F, # No plot or interim reports
fitfunction = "glm", # lm function
family = binomial) # binomial to run logistic regression
#glmulti.out1@formulas
summary(glmulti.out0@objects[[1]])
##
## Call:
## fitfunc(formula = as.formula(x), family = ..1, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4130 -0.5781 -0.4653 -0.3432 2.4901
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.369288 0.678471 7.914 2.50e-15 ***
## duration -0.008593 0.001584 -5.426 5.75e-08 ***
## acousticness -1.537242 0.345553 -4.449 8.64e-06 ***
## energy -3.954820 0.552814 -7.154 8.43e-13 ***
## instrumentalness -2.722274 1.849908 -1.472 0.141
## loudness 0.354367 0.042542 8.330 < 2e-16 ***
## speechiness 3.278056 0.677868 4.836 1.33e-06 ***
## valence -1.444874 0.255848 -5.647 1.63e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3102.1 on 3971 degrees of freedom
## Residual deviance: 2922.2 on 3964 degrees of freedom
## AIC: 2938.2
##
## Number of Fisher Scoring iterations: 6
All Possible Regression:
Compared to the full Logistic model, the Model from All subsets regression leaves out the variables Danceability, all Key dummy variables, and Tempo.
\[ log(\frac{\hat \pi}{1 - \hat \pi}) = 5.369 - 0.009{Duration} - 1.537{Acousticness} - 3.955{Energy} - 2.722{Instrumentalness} \\ + 0.354{Loudness} + 3.278{Speechiness} - 1.445{Valence} \]
store final model
allreg.logit0.final <- glmulti.out0@objects[[1]]
Predictions with 0.5 as the cut off
allreg.prob.test0 <- predict(allreg.logit0.final, newdata = logit.test0, type = "response")
allreg.pred.test0 <- ifelse(allreg.prob.test0 > 0.5, 1, 0) # using standard cutoff 0.5
confusionMatrix(as.factor(allreg.pred.test0), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 606 93
## 1 1 1
##
## Accuracy : 0.8659
## 95% CI : (0.8384, 0.8903)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 0.5275
##
## Kappa : 0.0153
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.010638
## Specificity : 0.998353
## Pos Pred Value : 0.500000
## Neg Pred Value : 0.866953
## Prevalence : 0.134094
## Detection Rate : 0.001427
## Detection Prevalence : 0.002853
## Balanced Accuracy : 0.504495
##
## 'Positive' Class : 1
##
Predictions where cut off yields the best kappa, 0.14
allreg.prob.test0 <- predict(allreg.logit0.final, newdata = logit.test0, type = "response")
allreg.pred.test0 <- ifelse(allreg.prob.test0 > allreg.out0.cut0, 1, 0) # using cutoff 0.14
confusionMatrix(as.factor(allreg.pred.test0), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 404 42
## 1 203 52
##
## Accuracy : 0.6505
## 95% CI : (0.6139, 0.6858)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1269
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.55319
## Specificity : 0.66557
## Pos Pred Value : 0.20392
## Neg Pred Value : 0.90583
## Prevalence : 0.13409
## Detection Rate : 0.07418
## Detection Prevalence : 0.36377
## Balanced Accuracy : 0.60938
##
## 'Positive' Class : 1
##
Predictions where cut off is the best balance of sensitivity and specificity, 0.15
allreg.prob.test0 <- predict(allreg.logit0.final, newdata = logit.test0, type = "response")
allreg.pred.test0 <- ifelse(allreg.prob.test0 > 0.15, 1, 0) # using cutoff 0.15
confusionMatrix(as.factor(allreg.pred.test0), as.factor(logit.test0$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 435 51
## 1 172 43
##
## Accuracy : 0.6819
## 95% CI : (0.646, 0.7162)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1128
##
## Mcnemar's Test P-Value : 9.297e-16
##
## Sensitivity : 0.45745
## Specificity : 0.71664
## Pos Pred Value : 0.20000
## Neg Pred Value : 0.89506
## Prevalence : 0.13409
## Detection Rate : 0.06134
## Detection Prevalence : 0.30670
## Balanced Accuracy : 0.58704
##
## 'Positive' Class : 1
##
set.seed(123)
ridge0 <- train(popular ~ ., data = logit.train0, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 0,lambda = seq(0.001,0.1,by = 0.001)))
#best tuning parameter
ridge0$bestTune
## alpha lambda
## 100 0 0.1
ridge0.model <- coef(ridge0$finalModel, ridge0$bestTune$lambda)
ridge0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 0.3812536494
## duration -0.0042651992
## acousticness -0.3997989337
## danceability -0.1441080290
## energy -0.6313909327
## instrumentalness -0.8210624307
## key1 -0.0140152818
## key2 -0.1557323229
## key3 0.4654206935
## key4 -0.1797053979
## key5 0.0291019833
## key6 0.0724378741
## key7 0.0623824309
## key8 -0.0095258891
## key9 0.0216445665
## key10 0.0301392242
## key11 -0.0311864381
## loudness 0.0852587410
## speechiness 1.4727602887
## tempo 0.0001278062
## valence -0.7066431211
Search for best cutoff using validation set
ridge0.out <- reg.opt.cut.func(ridge0, logit.valid0)
opt.cut.plot(ridge0.out)
# cut off by kappa
ridge0.out$cutoff[which.max(ridge0.out$kappa.vec)]
## [1] 0.15
ridge0.out$cutoff[which.min(ridge0.out$ssdiff.vec)]
## [1] 0.14
create final model
set.seed(123)
ridge0 <- train(popular ~ ., data = all.train0, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 0,lambda = seq(0.001,0.1,by = 0.001)))
#best tuning parameter
ridge0$bestTune
## alpha lambda
## 100 0 0.1
ridge0.model <- coef(ridge0$finalModel, ridge0$bestTune$lambda)
ridge0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 0.321762194
## duration -0.004289330
## acousticness -0.409439651
## danceability -0.062528153
## energy -0.607499947
## instrumentalness -0.839357369
## key1 0.013386505
## key2 -0.091231783
## key3 0.400527208
## key4 -0.198298922
## key5 0.053748073
## key6 0.142443327
## key7 0.036589828
## key8 0.030302380
## key9 -0.041870877
## key10 0.034094167
## key11 -0.034462056
## loudness 0.084161023
## speechiness 1.615890923
## tempo -0.000348579
## valence -0.724448093
Ridge 10 Fold Cross Validation:
\[ log(\frac{\hat \pi}{1 - \hat \pi}) = 0.322 - 0.004{Duration} - 0.409{Acousticness} - 0.063{Danceability} - 0.607{Energy} - 0.839{Instrumentalness} \\ 0.013{Key1} - 0.091{Key2} + 0.401{Key3} - 0.053{Key4} + 0.054{Key5} + 0.142{Key6} + 0.037{Key7} + 0.030{Key8} - 0.042{Key9} \\ + 0.034{Key10} - 0.034{Key11} + 0.084{Loudness} + 1.616{Speechiness} - 0.0003{Tempo} - 0.724{Valence} \]
predict and evaluate on the test set where cutoff is at 0.5
prob.ridge0 <- predict(ridge0, s = ridge0$bestTune, logit.test0, type = "prob")
pred.ridge0 <- ifelse(prob.ridge0[,2] > 0.50, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(pred.ridge0), as.factor(logit.test0$popular),
positive = "1")
## Warning in confusionMatrix.default(as.factor(pred.ridge0),
## as.factor(logit.test0$popular), : Levels are not in the same order for reference
## and data. Refactoring data to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 607 94
## 1 0 0
##
## Accuracy : 0.8659
## 95% CI : (0.8384, 0.8903)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 0.5275
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.8659
## Prevalence : 0.1341
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.15 corresponding to optimal kappa
prob.ridg0 <- predict(ridge0, s = ridge0bestTune, logit.test0, type = "prob")
pred.ridge0 <- ifelse(prob.ridge0[,2] > 0.15, 1, 0) # using cutoff = 0.15
confusionMatrix(as.factor(pred.ridge0), as.factor(logit.test0$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 464 63
## 1 143 31
##
## Accuracy : 0.7061
## 95% CI : (0.6709, 0.7396)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0693
##
## Mcnemar's Test P-Value : 3.709e-08
##
## Sensitivity : 0.32979
## Specificity : 0.76442
## Pos Pred Value : 0.17816
## Neg Pred Value : 0.88046
## Prevalence : 0.13409
## Detection Rate : 0.04422
## Detection Prevalence : 0.24822
## Balanced Accuracy : 0.54710
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.14 corresponding to optimal sensitivity and specificity balance
prob.ridg0 <- predict(ridge0, s = ridge0bestTune, logit.test0, type = "prob")
pred.ridge0 <- ifelse(prob.ridge0[,2] > 0.14, 1, 0) # using cutoff = 0.14
confusionMatrix(as.factor(pred.ridge0), as.factor(logit.test0$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 403 48
## 1 204 46
##
## Accuracy : 0.6405
## 95% CI : (0.6037, 0.6761)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0901
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.48936
## Specificity : 0.66392
## Pos Pred Value : 0.18400
## Neg Pred Value : 0.89357
## Prevalence : 0.13409
## Detection Rate : 0.06562
## Detection Prevalence : 0.35663
## Balanced Accuracy : 0.57664
##
## 'Positive' Class : 1
##
lasso0 <- train(popular ~ ., data = logit.train0, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
lasso0$bestTune
## alpha lambda
## 100 1 0.1
# best coefficient
lasso0.model <- coef(lasso0$finalModel, lasso0$bestTune$lambda)
lasso0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -1.843351
## duration .
## acousticness .
## danceability .
## energy .
## instrumentalness .
## key1 .
## key2 .
## key3 .
## key4 .
## key5 .
## key6 .
## key7 .
## key8 .
## key9 .
## key10 .
## key11 .
## loudness .
## speechiness .
## tempo .
## valence .
Search for best cutoff using validation set
lasso0.out <- reg.opt.cut.func(lasso0, logit.valid0)
opt.cut.plot(lasso0.out)
# cut off by kappa
lasso0.out$cutoff[which.max(lasso0.out$kappa.vec)]
## [1] 0
lasso0.out$cutoff[which.min(lasso0.out$ssdiff.vec)]
## [1] 0
lasso0.final <- train(popular ~ ., data = all.train0, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
lasso0.final$bestTune
## alpha lambda
## 100 1 0.1
# best coefficient
lasso0.model.final <- coef(lasso0.final$finalModel, lasso0.final$bestTune$lambda)
lasso0.model.final
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -1.881861
## duration .
## acousticness .
## danceability .
## energy .
## instrumentalness .
## key1 .
## key2 .
## key3 .
## key4 .
## key5 .
## key6 .
## key7 .
## key8 .
## key9 .
## key10 .
## key11 .
## loudness .
## speechiness .
## tempo .
## valence .
predict and evaluate on the test set where cutoff is at 0.5
prob.lasso0 <- predict(lasso0.final, s = lasso0.final$bestTune, logit.test0, type = "prob")
pred.lasso0 <- ifelse(prob.lasso0[,2] > 0.5, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(pred.lasso0), as.factor(logit.test0$popular),
positive = "1")
## Warning in confusionMatrix.default(as.factor(pred.lasso0),
## as.factor(logit.test0$popular), : Levels are not in the same order for reference
## and data. Refactoring data to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 607 94
## 1 0 0
##
## Accuracy : 0.8659
## 95% CI : (0.8384, 0.8903)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 0.5275
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.8659
## Prevalence : 0.1341
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.11
prob.lasso0 <- predict(lasso0.final, s = lasso0.final$bestTune, logit.test0, type = "prob")
pred.lasso0 <- ifelse(prob.lasso0[,2] > 0.11, 1, 0) # using cutoff = 0.11
confusionMatrix(as.factor(pred.lasso0), as.factor(logit.test0$popular),
positive = "1")
## Warning in confusionMatrix.default(as.factor(pred.lasso0),
## as.factor(logit.test0$popular), : Levels are not in the same order for reference
## and data. Refactoring data to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 0 0
## 1 607 94
##
## Accuracy : 0.1341
## 95% CI : (0.1097, 0.1616)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.1341
## Neg Pred Value : NaN
## Prevalence : 0.1341
## Detection Rate : 0.1341
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
set.seed(123)
enet0 <- train(popular ~ ., data = logit.train0, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = seq(0,1,by = 0.05),lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
enet0$bestTune
## alpha lambda
## 100 0 0.1
# best coefficient
enet0.model <- coef(enet0$finalModel, enet0$bestTune$lambda)
enet0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 0.3812536494
## duration -0.0042651992
## acousticness -0.3997989337
## danceability -0.1441080290
## energy -0.6313909327
## instrumentalness -0.8210624307
## key1 -0.0140152818
## key2 -0.1557323229
## key3 0.4654206935
## key4 -0.1797053979
## key5 0.0291019833
## key6 0.0724378741
## key7 0.0623824309
## key8 -0.0095258891
## key9 0.0216445665
## key10 0.0301392242
## key11 -0.0311864381
## loudness 0.0852587410
## speechiness 1.4727602887
## tempo 0.0001278062
## valence -0.7066431211
search for best cutoff with validation set
enet0.out <- reg.opt.cut.func(enet0, logit.valid0)
opt.cut.plot(enet0.out)
# cut off by kappa
enet0.out$cutoff[which.max(enet0.out$kappa.vec)]
## [1] 0.15
enet0.out$cutoff[which.min(enet0.out$ssdiff.vec)]
## [1] 0.14
create final model on train and validation
set.seed(123)
enet0 <- train(popular ~ ., data = all.train0, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = seq(0,1,by = 0.05),lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
enet0$bestTune
## alpha lambda
## 100 0 0.1
# best coefficient
enet0.model <- coef(enet0$finalModel, enet0$bestTune$lambda)
enet0.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 0.321762194
## duration -0.004289330
## acousticness -0.409439651
## danceability -0.062528153
## energy -0.607499947
## instrumentalness -0.839357369
## key1 0.013386505
## key2 -0.091231783
## key3 0.400527208
## key4 -0.198298922
## key5 0.053748073
## key6 0.142443327
## key7 0.036589828
## key8 0.030302380
## key9 -0.041870877
## key10 0.034094167
## key11 -0.034462056
## loudness 0.084161023
## speechiness 1.615890923
## tempo -0.000348579
## valence -0.724448093
Elastic Net 10 Cross Validation \[ log(\frac{\hat \pi}{1 - \hat \pi}) = 0.322 - 0.004{Duration} - 0.409{Acousticness} - 0.063{Deanceability} - 0.607{Energy} - 0.839{Instrumentalness} \\ + 0.013{Key1} - 0.091{Key2} + 0.401{Key3} - 0.198{Key4} + 0.054{Key5} + 0.142{Key6} + 0.037{Key7} + 0.030{Key 8} \\ - 0.041{Key9} + 0.034{Key10} - 0.034{Key11} + 0.084{Loudness} + 1.616{Speechiness} - 0.0003{Tempo} - 0.724{Valence} \]
predict and evaluate on the test set where cutoff is at 0.5
prob.enet0 <- predict(enet0, s = enet0$bestTune, logit.test0, type = "prob")
pred.enet0 <- ifelse(prob.enet0[,2] > 0.50, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(pred.enet0), as.factor(logit.test0$popular),
positive = "1")
## Warning in confusionMatrix.default(as.factor(pred.enet0),
## as.factor(logit.test0$popular), : Levels are not in the same order for reference
## and data. Refactoring data to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 607 94
## 1 0 0
##
## Accuracy : 0.8659
## 95% CI : (0.8384, 0.8903)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 0.5275
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.8659
## Prevalence : 0.1341
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.15 corresponding to optimal kappa
prob.enet0 <- predict(enet0, s = enet0$bestTune, logit.test0, type = "prob")
pred.enet0 <- ifelse(prob.enet0[,2] > 0.15, 1, 0) # using cutoff = 0.15
confusionMatrix(as.factor(pred.enet0), as.factor(logit.test0$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 464 63
## 1 143 31
##
## Accuracy : 0.7061
## 95% CI : (0.6709, 0.7396)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0693
##
## Mcnemar's Test P-Value : 3.709e-08
##
## Sensitivity : 0.32979
## Specificity : 0.76442
## Pos Pred Value : 0.17816
## Neg Pred Value : 0.88046
## Prevalence : 0.13409
## Detection Rate : 0.04422
## Detection Prevalence : 0.24822
## Balanced Accuracy : 0.54710
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.14 corresponding to optimal balance of sensitivity and specificity
prob.enet0 <- predict(enet0, s = enet0$bestTune, logit.test0, type = "prob")
pred.enet0 <- ifelse(prob.enet0[,2] > 0.14, 1, 0) # using cutoff = 0.14
confusionMatrix(as.factor(pred.enet0), as.factor(logit.test0$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 403 48
## 1 204 46
##
## Accuracy : 0.6405
## 95% CI : (0.6037, 0.6761)
## No Information Rate : 0.8659
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0901
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.48936
## Specificity : 0.66392
## Pos Pred Value : 0.18400
## Neg Pred Value : 0.89357
## Prevalence : 0.13409
## Detection Rate : 0.06562
## Detection Prevalence : 0.35663
## Balanced Accuracy : 0.57664
##
## 'Positive' Class : 1
##
Diagnostic measures
results.logit0 <- data.frame(model = c("Full", "Step 10CV", "All Subsets", "Ridge 10CV", "Elastic Net 10CV"),
cutoff = c(0.14, 0.14, 0.14, 0.14, 0.14),
AIC = c(2946, 2928, 2938.2, NA, NA),
Accuracy =c(0.6205, 0.6362, 0.6505, 0.6405, 0.6405),
Kappa = c(0.1013, 0.1007, 0.1269, 0.0901, 0.0901),
Sensitivity = c(0.55319, 0.5213, 0.55319, 0.48936, 0.48936),
Specificity = c(0.63097, 0.6540, 0.66557, 0.66392, 0.66392))
results.logit0
## model cutoff AIC Accuracy Kappa Sensitivity Specificity
## 1 Full 0.14 2946.0 0.6205 0.1013 0.55319 0.63097
## 2 Step 10CV 0.14 2928.0 0.6362 0.1007 0.52130 0.65400
## 3 All Subsets 0.14 2938.2 0.6505 0.1269 0.55319 0.66557
## 4 Ridge 10CV 0.14 NA 0.6405 0.0901 0.48936 0.66392
## 5 Elastic Net 10CV 0.14 NA 0.6405 0.0901 0.48936 0.66392
The best model is the all subsets regression method model as it has the highest accuracy, sensitivity and specificity values.
All Possible Regression:
Compared to the full Logistic model, the Model from All subsets regression leaves out the variables Danceability, all Key dummy variables, and Tempo.
\[ log(\frac{\hat \pi}{1 - \hat \pi}) = 5.369 - 0.009{Duration} - 1.537{Acousticness} - 3.955{Energy} - 2.722{Instrumentalness} \\ + 0.354{Loudness} + 3.278{Speechiness} - 1.445{Valence} \]
fitting logistic model using combo of train/valid/test, finding optimal model using training data.
# Fit logistic model on training data
v.logit.model1 <- glm(popular ~ ., family=binomial(link='logit'),data= logit.train1)
#search for best cutoff
out1 <- opt.cut.func(v.logit.model1, logit.valid1)
opt.cut.plot(out1)
out1$cutoff[which.min(out1$ssdiff.vec)]
## [1] 0.11
v.opt.cut1 <- out1$cutoff[which.max(out1$kappa.vec)]
v.opt.cut1
## [1] 0.15
Fit final model (combo of train and validation)
v.model.final1 <- glm(popular ~ ., data=all.train1, family=binomial(link='logit'))
summary(v.model.final1)
##
## Call:
## glm(formula = popular ~ ., family = binomial(link = "logit"),
## data = all.train1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6592 -0.5252 -0.4249 -0.3197 2.7327
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.197998 0.707127 4.523 6.11e-06 ***
## duration -0.007794 0.001373 -5.677 1.37e-08 ***
## acousticness -0.747392 0.251627 -2.970 0.00298 **
## danceability 0.934948 0.422723 2.212 0.02699 *
## energy -3.456932 0.480703 -7.191 6.41e-13 ***
## instrumentalness -12.325417 7.332297 -1.681 0.09277 .
## key1 0.133834 0.157571 0.849 0.39568
## key2 0.070416 0.167327 0.421 0.67388
## key3 0.585095 0.231789 2.524 0.01159 *
## key4 -0.158981 0.233397 -0.681 0.49577
## key5 0.030558 0.197109 0.155 0.87680
## key6 0.052258 0.193471 0.270 0.78708
## key7 0.173937 0.152944 1.137 0.25543
## key8 0.344365 0.168571 2.043 0.04107 *
## key9 0.274620 0.182927 1.501 0.13329
## key10 -0.385177 0.276928 -1.391 0.16426
## key11 -0.070512 0.212423 -0.332 0.73993
## loudness 0.329966 0.035393 9.323 < 2e-16 ***
## speechiness 4.733615 0.579107 8.174 2.98e-16 ***
## tempo 0.002791 0.001580 1.767 0.07728 .
## valence -1.385900 0.250695 -5.528 3.23e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4359.9 on 6237 degrees of freedom
## Residual deviance: 4089.1 on 6217 degrees of freedom
## AIC: 4131.1
##
## Number of Fisher Scoring iterations: 9
Full Logistic Model \[ log(\frac{\hat \pi}{1 - \hat \pi}) = 3.198 - 0.008{Duration} - 0.747{Acousticness} + 0.935{Danceability} - 3.457{Energy} - 12.325{Instrumentalness} \\ + 0.134{Key1} + 0.070{Key2} + 0.586{Key3} - 0.159{Key4} + 0.031{Key5} + 0.052{Key6} + 0.174{Key7} + 0.344{Key8} + 0.275{Key9} \\ \\ \\ - 0.385{Key10} - 0.071{Key11} + 0.330{Loudness} + 4.734{Speechiness} + 0.003{Tempo} - 1.386{Valence} \]
Variables Duration, Acousticness, Danceability, Energy, Key3 (D#/E-flat Major), Key8 (G#/A-flat Major), Loudness, Speechiness, and Valence are significant variables in the model.
predict on test using 0.5 cutoff
v.prob.test1 <- predict(v.logit.model1, newdata = logit.test1, type = "response")
v.pred.test1 <- ifelse(v.prob.test1 > 0.5, 1, 0) # using cutoff = 0.15
confusionMatrix(as.factor(v.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 987 113
## 1 1 0
##
## Accuracy : 0.8965
## 95% CI : (0.8769, 0.9138)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 0.5643
##
## Kappa : -0.0018
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000000
## Specificity : 0.9989879
## Pos Pred Value : 0.0000000
## Neg Pred Value : 0.8972727
## Prevalence : 0.1026340
## Detection Rate : 0.0000000
## Detection Prevalence : 0.0009083
## Balanced Accuracy : 0.4994939
##
## 'Positive' Class : 1
##
predict on test using optimal cutoff (kappa), 0.15
v.prob.test1 <- predict(v.logit.model1, newdata = logit.test1, type = "response")
v.pred.test1 <- ifelse(v.prob.test1 > v.opt.cut1, 1, 0) # using cutoff = 0.15
confusionMatrix(as.factor(v.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 817 72
## 1 171 41
##
## Accuracy : 0.7793
## 95% CI : (0.7536, 0.8035)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1367
##
## Mcnemar's Test P-Value : 3.243e-10
##
## Sensitivity : 0.36283
## Specificity : 0.82692
## Pos Pred Value : 0.19340
## Neg Pred Value : 0.91901
## Prevalence : 0.10263
## Detection Rate : 0.03724
## Detection Prevalence : 0.19255
## Balanced Accuracy : 0.59488
##
## 'Positive' Class : 1
##
predict on test using optimal cutoff (sensitivity and specificity), 0.11
v.prob.test1 <- predict(v.logit.model1, newdata = logit.test1, type = "response")
v.pred.test1 <- ifelse(v.prob.test1 > 0.11, 1, 0) # using cutoff = 0.11
confusionMatrix(as.factor(v.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 609 43
## 1 379 70
##
## Accuracy : 0.6167
## 95% CI : (0.5873, 0.6455)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1018
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.61947
## Specificity : 0.61640
## Pos Pred Value : 0.15590
## Neg Pred Value : 0.93405
## Prevalence : 0.10263
## Detection Rate : 0.06358
## Detection Prevalence : 0.40781
## Balanced Accuracy : 0.61793
##
## 'Positive' Class : 1
##
step-wise 10 fold cross validation
set.seed(123)
train_control <- trainControl(method = "cv",
number = 10)
# Fit K-fold CV model
meh <- capture.output(step1_kcv <- train(popular ~ ., data = logit.train1, family = binomial(),
method = "glmStepAIC", trControl = train_control, verbose = FALSE))
step1_kcv <- step1_kcv$finalModel
step1_kcv
##
## Call: NULL
##
## Coefficients:
## (Intercept) duration acousticness energy
## 4.073473 -0.007853 -0.844912 -3.381390
## instrumentalness key3 key4 key10
## -13.099381 0.632739 -0.501977 -0.372704
## loudness speechiness valence
## 0.321980 5.084945 -1.236185
##
## Degrees of Freedom: 5136 Total (i.e. Null); 5126 Residual
## Null Deviance: 3568
## Residual Deviance: 3357 AIC: 3379
summary(step1_kcv)
##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8184 -0.5225 -0.4259 -0.3238 2.7590
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.073473 0.632373 6.442 1.18e-10 ***
## duration -0.007853 0.001505 -5.217 1.82e-07 ***
## acousticness -0.844912 0.272201 -3.104 0.00191 **
## energy -3.381390 0.532916 -6.345 2.22e-10 ***
## instrumentalness -13.099381 8.926829 -1.467 0.14226
## key3 0.632739 0.221482 2.857 0.00428 **
## key4 -0.501977 0.254431 -1.973 0.04850 *
## key10 -0.372704 0.268693 -1.387 0.16541
## loudness 0.321980 0.038794 8.300 < 2e-16 ***
## speechiness 5.084945 0.633454 8.027 9.96e-16 ***
## valence -1.236185 0.248294 -4.979 6.40e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3568.2 on 5136 degrees of freedom
## Residual deviance: 3357.5 on 5126 degrees of freedom
## AIC: 3379.5
##
## Number of Fisher Scoring iterations: 9
#search for best cutoff
kcv.out1 <- opt.cut.func(step1_kcv, key.dummy(logit.valid1))
opt.cut.plot(kcv.out1)
kcv.out1$cutoff[which.min(kcv.out1$ssdiff.vec)]
## [1] 0.11
kcv.out1.cut1 <- kcv.out1$cutoff[which.max(kcv.out1$kappa.vec)]
kcv.out1.cut1
## [1] 0.13
Fit final model (combo of train and validation)
set.seed(123)
finalmeh <- capture.output(step1_kcv.final <- train(popular ~ ., data = all.train1, family = binomial(),
method = "glmStepAIC", trControl = train_control, verbose = FALSE))
step1_kcv.final <- step1_kcv.final$finalModel
step1_kcv.final
##
## Call: NULL
##
## Coefficients:
## (Intercept) duration acousticness danceability
## 3.279565 -0.007749 -0.777317 0.976056
## energy instrumentalness key3 key8
## -3.477833 -12.106204 0.512832 0.267298
## key10 loudness speechiness tempo
## -0.459592 0.333033 4.790242 0.002819
## valence
## -1.411113
##
## Degrees of Freedom: 6237 Total (i.e. Null); 6225 Residual
## Null Deviance: 4360
## Residual Deviance: 4095 AIC: 4121
summary(step1_kcv.final)
##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6496 -0.5252 -0.4275 -0.3218 2.7325
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.279565 0.694168 4.724 2.31e-06 ***
## duration -0.007749 0.001370 -5.657 1.54e-08 ***
## acousticness -0.777317 0.250366 -3.105 0.0019 **
## danceability 0.976056 0.421673 2.315 0.0206 *
## energy -3.477833 0.480062 -7.245 4.34e-13 ***
## instrumentalness -12.106204 7.288516 -1.661 0.0967 .
## key3 0.512832 0.209177 2.452 0.0142 *
## key8 0.267298 0.136787 1.954 0.0507 .
## key10 -0.459592 0.258542 -1.778 0.0755 .
## loudness 0.333033 0.035373 9.415 < 2e-16 ***
## speechiness 4.790242 0.577726 8.292 < 2e-16 ***
## tempo 0.002819 0.001578 1.786 0.0741 .
## valence -1.411113 0.249814 -5.649 1.62e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4359.9 on 6237 degrees of freedom
## Residual deviance: 4094.5 on 6225 degrees of freedom
## AIC: 4120.5
##
## Number of Fisher Scoring iterations: 9
Stepwise 10 Fold Cross Validation:
Compared to the full Logistic model, the stepwise model does not include the variables Key1, Key2, Key4-Key7, Key9 and Key11.
\[ log(\frac{\hat \pi}{1 - \hat \pi}) = 3.280 - 0.008{Duration} - 0.777{Acousticness} + 0.976{Danceability} - 3.478{Energy} - 12.106{Instrumentalness} \\ + 0.513{Key3} - 0.267{Key8} -0.460{Key10} + 0.333{Loudness} + 4.790{Speechiness} + 0.003{Tempo} - 1.411{Valence} \\ \]
Variables Duration, Acousticness, Danceability, Energy, Key3 (D#/E-flat Major), Loudness, Speechiness, and Valence are significant variables in the model.
predict on test using 0.5 cutoff
kcv.prob.test1 <- predict(step1_kcv.final, newdata = key.dummy(logit.test1), type = "response")
kcv.pred.test1 <- ifelse(kcv.prob.test1 > 0.5, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(kcv.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 988 112
## 1 0 1
##
## Accuracy : 0.8983
## 95% CI : (0.8789, 0.9155)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 0.4854
##
## Kappa : 0.0158
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0088496
## Specificity : 1.0000000
## Pos Pred Value : 1.0000000
## Neg Pred Value : 0.8981818
## Prevalence : 0.1026340
## Detection Rate : 0.0009083
## Detection Prevalence : 0.0009083
## Balanced Accuracy : 0.5044248
##
## 'Positive' Class : 1
##
Using cutoff 0.13 which yields optimal kappa
kcv.prob.test1 <- predict(step1_kcv.final, newdata = key.dummy(logit.test1), type = "response")
kcv.pred.test1 <- ifelse(kcv.prob.test1 > kcv.out1.cut1, 1, 0) # using cutoff = 0.13
confusionMatrix(as.factor(kcv.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 723 56
## 1 265 57
##
## Accuracy : 0.7084
## 95% CI : (0.6806, 0.7352)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1299
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.50442
## Specificity : 0.73178
## Pos Pred Value : 0.17702
## Neg Pred Value : 0.92811
## Prevalence : 0.10263
## Detection Rate : 0.05177
## Detection Prevalence : 0.29246
## Balanced Accuracy : 0.61810
##
## 'Positive' Class : 1
##
Using cut off 0.11 which yields the best balance between sensitivity and specificity.
kcv.prob.test1 <- predict(step1_kcv.final, newdata = key.dummy(logit.test1), type = "response")
kcv.pred.test1 <- ifelse(kcv.prob.test1 > 0.11, 1, 0) # using cutoff = 0.11
confusionMatrix(as.factor(kcv.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 594 45
## 1 394 68
##
## Accuracy : 0.6013
## 95% CI : (0.5717, 0.6303)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0857
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.60177
## Specificity : 0.60121
## Pos Pred Value : 0.14719
## Neg Pred Value : 0.92958
## Prevalence : 0.10263
## Detection Rate : 0.06176
## Detection Prevalence : 0.41962
## Balanced Accuracy : 0.60149
##
## 'Positive' Class : 1
##
set.seed(123)
glmulti.out1 <- glmulti(popular ~ ., data = logit.train1,
level = 1, # No interaction considered
method = "h", # Exhaustive approach
crit = "aic", # AIC as criteria
confsetsize = 5, # Keep 5 best models
plotty = F, report = F, # No plot or interim reports
fitfunction = "glm", # lm function
family = binomial) # binomial to run logistic regression
#glmulti.out1@formulas
view summary of top model
summary(glmulti.out1@objects[[1]])
##
## Call:
## fitfunc(formula = as.formula(x), family = ..1, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8240 -0.5209 -0.4300 -0.3327 2.7606
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.019309 0.628360 6.397 1.59e-10 ***
## duration -0.007683 0.001494 -5.143 2.71e-07 ***
## acousticness -0.831974 0.270427 -3.077 0.00209 **
## energy -3.389625 0.529955 -6.396 1.59e-10 ***
## instrumentalness -13.175893 9.028253 -1.459 0.14445
## loudness 0.320576 0.038525 8.321 < 2e-16 ***
## speechiness 5.094770 0.631621 8.066 7.25e-16 ***
## valence -1.227177 0.247290 -4.963 6.96e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3568.2 on 5136 degrees of freedom
## Residual deviance: 3372.1 on 5129 degrees of freedom
## AIC: 3388.1
##
## Number of Fisher Scoring iterations: 9
Store model
allreg.logit1 <- glmulti.out1@objects[[1]]
#search for best cutoff
allreg.out1 <- opt.cut.func(allreg.logit1, logit.valid1)
opt.cut.plot(allreg.out1)
allreg.out1$cutoff[which.min(allreg.out1$ssdiff.vec)]
## [1] 0.11
allreg.out1.cut1 <- allreg.out1$cutoff[which.max(allreg.out1$kappa.vec)]
allreg.out1.cut1
## [1] 0.16
fit final model to combo of training and validation
set.seed(123)
glmulti.out1 <- glmulti(popular ~ ., data = all.train1,
level = 1, # No interaction considered
method = "h", # Exhaustive approach
crit = "aic", # AIC as criteria
confsetsize = 5, # Keep 5 best models
plotty = F, report = F, # No plot or interim reports
fitfunction = "glm", # lm function
family = binomial) # binomial to run logistic regression
#glmulti.out1@formulas
summary(glmulti.out1@objects[[1]])
##
## Call:
## fitfunc(formula = as.formula(x), family = ..1, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6717 -0.5243 -0.4296 -0.3299 2.7251
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.289296 0.692110 4.753 2.01e-06 ***
## duration -0.007681 0.001366 -5.624 1.86e-08 ***
## acousticness -0.750350 0.249223 -3.011 0.00261 **
## danceability 0.942373 0.420840 2.239 0.02514 *
## energy -3.483207 0.478705 -7.276 3.43e-13 ***
## instrumentalness -11.929151 7.266196 -1.642 0.10065
## loudness 0.333199 0.035260 9.450 < 2e-16 ***
## speechiness 4.822191 0.576504 8.365 < 2e-16 ***
## tempo 0.002978 0.001572 1.894 0.05822 .
## valence -1.405615 0.249178 -5.641 1.69e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4359.9 on 6237 degrees of freedom
## Residual deviance: 4107.6 on 6228 degrees of freedom
## AIC: 4127.6
##
## Number of Fisher Scoring iterations: 9
All Possible Regression Compared to the full model, the All subsets regression model does not include any of the Key dummy variables.
$$
log() = 3.289 - 0.008{Duration} - 0.750{Acousticness} + 0.942{Danceability} - 3.483{Energy} - 11.929{Instrumentalness} \ + 0.333{Loudness} + 4.822{Speechiness} + 0.003{Tempo} - 1.406{Valence}
$$
Variables Duration, Acousticness, Danceability, Energy, Loudness, Speechiness, and Valence are significant in the model
store final model
allreg.logit1.final <- glmulti.out1@objects[[1]]
Predictions with 0.5 as the cut off
allreg.prob.test1 <- predict(allreg.logit1.final, newdata = logit.test1, type = "response")
allreg.pred.test1 <- ifelse(allreg.prob.test1 > 0.5, 1, 0) # using standard cutoff 0.5
confusionMatrix(as.factor(allreg.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 988 112
## 1 0 1
##
## Accuracy : 0.8983
## 95% CI : (0.8789, 0.9155)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 0.4854
##
## Kappa : 0.0158
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0088496
## Specificity : 1.0000000
## Pos Pred Value : 1.0000000
## Neg Pred Value : 0.8981818
## Prevalence : 0.1026340
## Detection Rate : 0.0009083
## Detection Prevalence : 0.0009083
## Balanced Accuracy : 0.5044248
##
## 'Positive' Class : 1
##
Predictions where cut off is the best kappa, 0.16
allreg.prob.test1 <- predict(allreg.logit1.final, newdata = logit.test1, type = "response")
allreg.pred.test1 <- ifelse(allreg.prob.test1 > allreg.out1.cut1, 1, 0) # using optimal cutoff 0.16
confusionMatrix(as.factor(allreg.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 841 77
## 1 147 36
##
## Accuracy : 0.7965
## 95% CI : (0.7715, 0.82)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.1332
##
## Mcnemar's Test P-Value : 4.022e-06
##
## Sensitivity : 0.3186
## Specificity : 0.8512
## Pos Pred Value : 0.1967
## Neg Pred Value : 0.9161
## Prevalence : 0.1026
## Detection Rate : 0.0327
## Detection Prevalence : 0.1662
## Balanced Accuracy : 0.5849
##
## 'Positive' Class : 1
##
Predictions where cut off is the best balance of sensitivity and specificity, 0.11
allreg.prob.test1 <- predict(allreg.logit1.final, newdata = logit.test1, type = "response")
allreg.pred.test1 <- ifelse(allreg.prob.test1 > 0.11, 1, 0) # using optimal cutoff 0.11
confusionMatrix(as.factor(allreg.pred.test1), as.factor(logit.test1$popular), positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 594 44
## 1 394 69
##
## Accuracy : 0.6022
## 95% CI : (0.5726, 0.6312)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0893
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.61062
## Specificity : 0.60121
## Pos Pred Value : 0.14903
## Neg Pred Value : 0.93103
## Prevalence : 0.10263
## Detection Rate : 0.06267
## Detection Prevalence : 0.42053
## Balanced Accuracy : 0.60592
##
## 'Positive' Class : 1
##
set.seed(123)
ridge1 <- train(popular ~ ., data = logit.train1, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 0,lambda = seq(0.001,0.1,by = 0.001)))
#best tuning parameter
ridge1$bestTune
## alpha lambda
## 100 0 0.1
ridge1.model <- coef(ridge1$finalModel, ridge1$bestTune$lambda)
ridge1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -0.940884985
## duration -0.003302624
## acousticness -0.107972712
## danceability -0.008980677
## energy -0.260064046
## instrumentalness -0.827713363
## key1 0.046091952
## key2 0.008592692
## key3 0.349398913
## key4 -0.189817749
## key5 0.003945898
## key6 -0.027278685
## key7 0.091414861
## key8 0.090088954
## key9 0.089685627
## key10 -0.163351622
## key11 -0.024566167
## loudness 0.062462753
## speechiness 2.309752435
## tempo 0.001341193
## valence -0.487758435
Search for best cutoff using validation set
ridge1.out <- reg.opt.cut.func(ridge1, logit.valid1)
opt.cut.plot(ridge1.out)
# cut off by kappa
ridge1.out$cutoff[which.max(ridge1.out$kappa.vec)]
## [1] 0.12
ridge1.out$cutoff[which.min(ridge1.out$ssdiff.vec)]
## [1] 0.11
create final model
set.seed(123)
ridge1 <- train(popular ~ ., data = all.train1, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 0,lambda = seq(0.001,0.1,by = 0.001)))
#best tuning parameter
ridge1$bestTune
## alpha lambda
## 100 0 0.1
ridge1.model <- coef(ridge1$finalModel, ridge1$bestTune$lambda)
ridge1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -1.031119627
## duration -0.003368704
## acousticness -0.103141402
## danceability 0.159103462
## energy -0.270068249
## instrumentalness -0.817245806
## key1 0.026882836
## key2 0.007008190
## key3 0.266050136
## key4 -0.114394579
## key5 -0.025139220
## key6 -0.032287416
## key7 0.078237033
## key8 0.138366721
## key9 0.130311371
## key10 -0.214434940
## key11 -0.063822095
## loudness 0.064513136
## speechiness 2.177666732
## tempo 0.001583618
## valence -0.474359450
Ridge 10 Fold Cross Validation \[ log(\frac{\hat \pi}{1 - \hat \pi}) = -1.031 - 0.003{Duration} - 0.103{Acousticness} + 0.159{Danceability} - 0.270{Energy} - 0.817{Instrumentalness} \\ + 0.027{Key1} + 0.007{Key2} + 0.266{Key3} - 0.114{Key4} - 0.025{Key5} - 0.032{Key6} + 0.078{Key7} + 0.138{Key8} + 0.130{Key9} \\ \\ \\ - 0.214{Key10} - 0.064{Key11} + 0.065{Loudness} + 2.178{Speechiness} + 0.002{Tempo} - 0.474{Valence} \]
predict and evaluate on the test set where cutoff is at 0.5
prob.ridge1 <- predict(ridge1, s = ridge1$bestTune, logit.test1, type = "prob")
pred.ridge1 <- ifelse(prob.ridge1[,2] > 0.50, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(pred.ridge1), as.factor(logit.test1$popular),
positive = "1")
## Warning in confusionMatrix.default(as.factor(pred.ridge1),
## as.factor(logit.test1$popular), : Levels are not in the same order for reference
## and data. Refactoring data to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 988 113
## 1 0 0
##
## Accuracy : 0.8974
## 95% CI : (0.8779, 0.9147)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 0.525
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.8974
## Prevalence : 0.1026
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.12 corresponding to optimal kappa
prob.ridge1 <- predict(ridge1, s = ridge1$bestTune, logit.test1, type = "prob")
pred.ridge1 <- ifelse(prob.ridge1[,2] > 0.12, 1, 0) # using cutoff = 0.12
confusionMatrix(as.factor(pred.ridge1), as.factor(logit.test1$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 703 60
## 1 285 53
##
## Accuracy : 0.6866
## 95% CI : (0.6583, 0.714)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.096
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.46903
## Specificity : 0.71154
## Pos Pred Value : 0.15680
## Neg Pred Value : 0.92136
## Prevalence : 0.10263
## Detection Rate : 0.04814
## Detection Prevalence : 0.30699
## Balanced Accuracy : 0.59028
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.11 corresponding to optimal balance of sensitivity and specificity
prob.ridge1 <- predict(ridge1, s = ridge1$bestTune, logit.test1, type = "prob")
pred.ridge1 <- ifelse(prob.ridge1[,2] > 0.11, 1, 0) # using cutoff = 0.13
confusionMatrix(as.factor(pred.ridge1), as.factor(logit.test1$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 557 36
## 1 431 77
##
## Accuracy : 0.5758
## 95% CI : (0.546, 0.6053)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0962
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.68142
## Specificity : 0.56377
## Pos Pred Value : 0.15157
## Neg Pred Value : 0.93929
## Prevalence : 0.10263
## Detection Rate : 0.06994
## Detection Prevalence : 0.46140
## Balanced Accuracy : 0.62259
##
## 'Positive' Class : 1
##
lasso1 <- train(popular ~ ., data = logit.train1, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
lasso1$bestTune
## alpha lambda
## 100 1 0.1
# best coefficient
lasso1.model <- coef(lasso1$finalModel, lasso1$bestTune$lambda)
lasso1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -2.086909
## duration .
## acousticness .
## danceability .
## energy .
## instrumentalness .
## key1 .
## key2 .
## key3 .
## key4 .
## key5 .
## key6 .
## key7 .
## key8 .
## key9 .
## key10 .
## key11 .
## loudness .
## speechiness .
## tempo .
## valence .
Search for best cutoff using validation set
lasso1.out <- reg.opt.cut.func(lasso1, logit.valid1)
opt.cut.plot(lasso1.out)
# cut off by kappa
lasso1.out$cutoff[which.max(lasso1.out$kappa.vec)]
## [1] 0
lasso1.out$cutoff[which.min(lasso1.out$ssdiff.vec)]
## [1] 0
lasso1.final <- train(popular ~ ., data = all.train1, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = 1,lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
lasso1.final$bestTune
## alpha lambda
## 100 1 0.1
# best coefficient
lasso1.model.final <- coef(lasso1.final$finalModel, lasso1.final$bestTune$lambda)
lasso1.model.final
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -2.076379
## duration .
## acousticness .
## danceability .
## energy .
## instrumentalness .
## key1 .
## key2 .
## key3 .
## key4 .
## key5 .
## key6 .
## key7 .
## key8 .
## key9 .
## key10 .
## key11 .
## loudness .
## speechiness .
## tempo .
## valence .
predict and evaluate on the test set where cutoff is at 0.5
prob.lasso1 <- predict(lasso1.final, s = lasso1.final$bestTune, logit.test1, type = "prob")
pred.lasso1 <- ifelse(prob.lasso1[,2] > 0.5, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(pred.lasso1), as.factor(logit.test1$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 988 113
## 1 0 0
##
## Accuracy : 0.8974
## 95% CI : (0.8779, 0.9147)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 0.525
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.8974
## Prevalence : 0.1026
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.11
prob.lasso1 <- predict(lasso1.final, s = lasso1.final$bestTune, logit.test1, type = "prob")
pred.lasso1 <- ifelse(prob.lasso1[,2] > 0.11, 1, 0) # using cutoff = 0.11
confusionMatrix(as.factor(pred.lasso1), as.factor(logit.test1$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 0 0
## 1 988 113
##
## Accuracy : 0.1026
## 95% CI : (0.0853, 0.1221)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.1026
## Neg Pred Value : NaN
## Prevalence : 0.1026
## Detection Rate : 0.1026
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
set.seed(123)
enet1 <- train(popular ~ ., data = logit.train1, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = seq(0,1,by = 0.05),lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
enet1$bestTune
## alpha lambda
## 100 0 0.1
# best coefficient
enet1.model <- coef(enet1$finalModel, enet1$bestTune$lambda)
enet1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -0.940884985
## duration -0.003302624
## acousticness -0.107972712
## danceability -0.008980677
## energy -0.260064046
## instrumentalness -0.827713363
## key1 0.046091952
## key2 0.008592692
## key3 0.349398913
## key4 -0.189817749
## key5 0.003945898
## key6 -0.027278685
## key7 0.091414861
## key8 0.090088954
## key9 0.089685627
## key10 -0.163351622
## key11 -0.024566167
## loudness 0.062462753
## speechiness 2.309752435
## tempo 0.001341193
## valence -0.487758435
search for best cutoff with validation set
enet1.out <- reg.opt.cut.func(enet1, logit.valid1)
opt.cut.plot(enet1.out)
# cut off by kappa
enet1.out$cutoff[which.max(enet1.out$kappa.vec)]
## [1] 0.12
enet1.out$cutoff[which.min(enet1.out$ssdiff.vec)]
## [1] 0.11
create final model on train and validation
set.seed(123)
enet1 <- train(popular ~ ., data = all.train1, method = "glmnet",
family = "binomial", trControl = train_control,
tuneGrid = expand.grid(alpha = seq(0,1,by = 0.05),lambda = seq(0.001,0.1,by = 0.001)))
# best parameter
enet1$bestTune
## alpha lambda
## 100 0 0.1
# best coefficient
enet1.model <- coef(enet1$finalModel, enet1$bestTune$lambda)
enet1.model
## 21 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -1.031119627
## duration -0.003368704
## acousticness -0.103141402
## danceability 0.159103462
## energy -0.270068249
## instrumentalness -0.817245806
## key1 0.026882836
## key2 0.007008190
## key3 0.266050136
## key4 -0.114394579
## key5 -0.025139220
## key6 -0.032287416
## key7 0.078237033
## key8 0.138366721
## key9 0.130311371
## key10 -0.214434940
## key11 -0.063822095
## loudness 0.064513136
## speechiness 2.177666732
## tempo 0.001583618
## valence -0.474359450
Elastic Net 10 Cross Validation \[ log(\frac{\hat \pi}{1 - \hat \pi}) = -1.031 - 0.003{Duration} - 0.103{Acousticness} + 0.159{Deanceability} - 0.270{Energy} - 0.817{Instrumentalness} \\ + 0.027{Key1} + 0.007{Key2} + 0.266{Key3} - 0.114{Key4} - 0.025{Key5} - 0.032{Key6} + 0.078{Key7} + 0.138{Key 8} \\ + 0.130{Key9} - 0.214{Key10} - 0.064{Key11} + 0.065{Loudness} + 2.178{Speechiness} + 0.002{Tempo} - 0.474{Valence} \]
predict and evaluate on the test set where cutoff is at 0.5
prob.enet1 <- predict(enet1, s = enet1$bestTune, logit.test1, type = "prob")
pred.enet1 <- ifelse(prob.enet1[,2] > 0.50, 1, 0) # using cutoff = 0.5
confusionMatrix(as.factor(pred.enet1), as.factor(logit.test1$popular),
positive = "1")
## Warning in confusionMatrix.default(as.factor(pred.enet1),
## as.factor(logit.test1$popular), : Levels are not in the same order for reference
## and data. Refactoring data to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 988 113
## 1 0 0
##
## Accuracy : 0.8974
## 95% CI : (0.8779, 0.9147)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 0.525
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.8974
## Prevalence : 0.1026
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.12 corresponding to optimal kappa
prob.enet1 <- predict(enet1, s = enet1$bestTune, logit.test1, type = "prob")
pred.enet1 <- ifelse(prob.enet1[,2] > 0.12, 1, 0) # using cutoff = 0.12
confusionMatrix(as.factor(pred.enet1), as.factor(logit.test1$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 703 60
## 1 285 53
##
## Accuracy : 0.6866
## 95% CI : (0.6583, 0.714)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.096
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.46903
## Specificity : 0.71154
## Pos Pred Value : 0.15680
## Neg Pred Value : 0.92136
## Prevalence : 0.10263
## Detection Rate : 0.04814
## Detection Prevalence : 0.30699
## Balanced Accuracy : 0.59028
##
## 'Positive' Class : 1
##
predict and evaluate on the test set where cutoff is at 0.11 corresponding to optimal balance of sensitivity and specificity
prob.enet1 <- predict(enet1, s = enet1$bestTune, logit.test1, type = "prob")
pred.enet1 <- ifelse(prob.enet1[,2] > 0.11, 1, 0) # using cutoff = 0.13
confusionMatrix(as.factor(pred.enet1), as.factor(logit.test1$popular),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 557 36
## 1 431 77
##
## Accuracy : 0.5758
## 95% CI : (0.546, 0.6053)
## No Information Rate : 0.8974
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0962
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.68142
## Specificity : 0.56377
## Pos Pred Value : 0.15157
## Neg Pred Value : 0.93929
## Prevalence : 0.10263
## Detection Rate : 0.06994
## Detection Prevalence : 0.46140
## Balanced Accuracy : 0.62259
##
## 'Positive' Class : 1
##
Diagnostic measures
results.logit1 <- data.frame(model = c("Full", "Step 10CV", "All Subsets", "Ridge 10CV", "Elastic Net 10CV"),
cutoff = c(0.11, 0.11, 0.11, 0.11, 0.11),
AIC = c(4131.1, 4121, 4127.6, NA, NA),
Accuracy =c(0.6167, 0.6013, 0.6022, 0.5758, 0.5758),
Kappa = c(0.1018, 0.0857, 0.0893, 0.0962, 0.0962),
Sensitivity = c(0.61947, 0.60177, 0.61062, 0.68142, 0.68142),
Specificity = c(0.61640, 0.60121, 0.60121, 0.56377, 0.56377))
results.logit1
## model cutoff AIC Accuracy Kappa Sensitivity Specificity
## 1 Full 0.11 4131.1 0.6167 0.1018 0.61947 0.61640
## 2 Step 10CV 0.11 4121.0 0.6013 0.0857 0.60177 0.60121
## 3 All Subsets 0.11 4127.6 0.6022 0.0893 0.61062 0.60121
## 4 Ridge 10CV 0.11 NA 0.5758 0.0962 0.68142 0.56377
## 5 Elastic Net 10CV 0.11 NA 0.5758 0.0962 0.68142 0.56377
The Full logistic model yields the best performance in Accuracy and optimal balance between sensitivity and specificity.
Full Logistic Model \[ log(\frac{\hat \pi}{1 - \hat \pi}) = 3.198 - 0.008{Duration} - 0.747{Acousticness} + 0.935{Danceability} - 3.457{Energy} - 12.325{Instrumentalness} \\ + 0.134{Key1} + 0.070{Key2} + 0.586{Key3} - 0.159{Key4} + 0.031{Key5} + 0.052{Key6} + 0.174{Key7} + 0.344{Key8} + 0.275{Key9} \\ \\ \\ - 0.385{Key10} - 0.071{Key11} + 0.330{Loudness} + 4.734{Speechiness} + 0.003{Tempo} - 1.386{Valence} \]
Variables Duration, Acousticness, Danceability, Energy, Key3 (D#/E-flat Major), Key8 (G#/A-flat Major), Loudness, Speechiness, and Valence are significant variables in the model.